Archive for the ‘Reference Dimension’ Category

Reference Materialized Dimension = Wrong Results


Reference dimensions let you create a relationship between a measure group and a dimension using an intermediate dimension to act as a bridge between them. In the Adventure Works demo, for example, they are used to link reseller sales and orders to the geography dimension.


Now, something that is less evident is how this specific kind of relationship is handled by SSAS. Let us look further in the definition of the relationship:


If the relationship is not materialized (something SSAS will dislike) then the relationship will be resolved during query time, with slow performances. Nevertheless, everything will work fine.

If, on the other hand, we follow SSAS suggestion and decide to materialize it, we will need to understand what is going on under the cover, to avoid a very subtle misfeature of SSAS.

When the relationship is materialized, SSAS will resolve it during processing time issuing a more complex query, during the partition processing (not dimension, I have written the right word: partition!) issuing a JOIN between the fact table and the intermediate dimension, to gather the value of the key for the final dimension from the intermediate one. This will lead to poor processing performances, but it is something that we already know. Anyway, what is important to point out is that the relationship is resolved during partition, not dimension processing.

The real problem comes when we need to update the relationship. We might think (at least this is what I thought) that issuing a ProcessUpdate on the intermediate dimension would refresh the relationship and perform all the necessary updates. This is not true. Since the materialized relationship is resolved during partition processing, any dimension processing will not refresh it. You can try to ProcessUpdate the intermediate or the final dimension: no way, the relationship still uses the data that was there when the partition was processed first time.If we want to refresh this relationship, the only way is to reprocess the partition, at that point we will have the new data correctly available.

This might be a very big problem if, for very large cubes, we decide to reprocess only the newest partitions, reprocessing only the needed aggregations for the older ones. If older partition will not be processed they will use the older version of the relationship. We will end up with an inconsistent situation where some partitions use a relationship and other use a different one, depending on when they have been last processed.

Clearly, if we rely on non-materialized relationships, everything will work fine since the relationship will rely only on dimensional data and does not need to be resolved during partition processing. In fact, issuing a ProcessUpdate on the intermediate dimension will imediately refresh the non materialized relationship, as it is supposed to do.

I think BOL should be much clearer on this, materializing a relationship is not just a matter of query speed, it has subtle consequences that, at their extreme, might lead to incorrect results from the cube.

Categorias:Reference Dimension