UDM, New Dimension Types and Analysis Services 2005
One of the key goals of Analysis Services 2005 is to “bring together the best of both relational and OLAP reporting into a single Unified Dimensional Model”. This is a very ambitious goal and most would agree that it would be difficult to fully achieve. However, Analysis Services 2005 certainly does take huge strides in this direction.
An example is that the UDM schema assumes many characteristics that were traditionally reserved solely for the relational world.
The most powerful of these has to be the attribute-based model where there is roughly a one-to-one relationship between columns in the relational source and dimension attributes. As each attribute typically has its own little hierarchy, straight off the bat we can get 20 or 30 hierarchies per dimension. This is very different to the behaviour in Analysis Services 2000 where multiple hierarchies per dimension are no more than a naming convention. This really does support much richer reporting requirements.
Furthermore, UDM schemas are far more flexible. In Analysis Services 2000, we were constrained to very rigid hierarchies defining the metadata of our cubes. Whereas in 2005, we can model complex relationships between our database entities such as many-to-many relationships. Detail-level reporting was also primarily in the realm of relational reporting. It too is addressed by the UDM as discussed below.
The new dimension types in Analysis Services 2005 contribute greatly to these objectives. Let’s look at each one.
Fact Dimensions are otherwise known as degenerate dimensions (as coined by Ralph Kimball). Let’s say our fact table contains information that would be useful to query by or even aggregate by; for example, courier tracking number. In Analysis Services 2000 we would have to create a TrackingNumber dimension table and populate it as part of the ETL. But this causes unnecessary duplication of data, and adds extra complexity to the design. Analysis Services 2005 overcomes this limitation because the fact table can refer to itself doubling up as a dimension table. With fact dimensions, we can drill down to the very lowest level of detail (where there is a member for each fact table row) natively in the UDM. We use the fact table primary key as the key attribute of the fact dimension.
In Analysis Services 2000, we required techniques such as drill through to achieve this. But drill through is an added layer of complexity and often doesn’t work as well as direct relational reporting for detail-level reports. Another option is to revert to relational reporting altogether, but this introduces inconsistency because we no longer have a single meta-data model for data access; we have to use TSQL for some queries, MDX for others, potentially different reporting tools, some databases are optimised for relational reporting while others are geared to OLAP reporting …
Note that Autoexists does apply when querying across a fact dimension with another dimension. For those who have not heard of Autoexists, it is a concept that has been introduced with Analysis Services 2005. The attribute-based model used by the UDM is vastly richer than the dimensional model used by Analysis Services 2000. However, this means the number of attribute permutations can become overwhelming. Hence Autoexists resolves the problem by not returning non-existent cells when performing cross-joins. For example, the query “Select (Date.[Calendar Year]. * Date.[Calendar Quarter].Members) …” will return only the quarters of 2005 cross-joined with 2005. The query will not return nulls for the 2003 quarters cross-joined with 2005, 2004 quarters cross-joined with 2005, and so on. Note that Autoexists only affects data pertaining to dimensions; it has nothing to do with fact table data. Furthermore, Autoexists only applies to the cross-join of attribute hierarchies within a single dimension; with the exception of fact dimensions.
Another feature of Analysis Services 2005 that was not in 2000 is that we can have a MOLAP measure group related to a ROLAP dimension. So we can browse MOLAP data as usual and then, once we decide to use a dimension that is stored as ROLAP, the query will switch to ROLAP mode. The reason this is relevant to fact dimensions is that they are often good candidates for ROLAP storage. We typically wouldn’t want to store millions of OrderDetail records in a MOLAP store because there would be no aggregation and it would not utilise the strengths of MOLAP. When I tested the performance of ROLAP vs. MOLAP queries in this scenario, there was hardly any difference at all. However, the cube-processing window can be reduced with ROLAP fact dimensions.
The following screenshot shows the dimension designer for a fact dimension. The things to note are as follows.
- The dimension table is a fact table (FactInternetSales).
- The KeyColumns property of the key attribute (Internet Sales Order) is the primary key of the fact table (SalesOrderNumber and SalesOrderLineNumber).
The following screenshot shows how a fact dimension is set up in the Dimension Usage tab. The dimension and measure group tables are one and the same.
A single dimension can be related to a set of facts through multiple roles. For example, FactOrderDetails can have order date, a ship date, and a receipt date. In this case our time dimension fulfils 3 roles with respect to dimensioning orders.
The way this is represented in a Data Source View is with a foreign key relationship for each role. As shown by the following screenshot, there are 3 foreign-key relationships between FactInternetSales and DimTime. One is selected and the columns forming the relationship are displayed. In this case the source column is ShipDateKey, but the other foreign keys use OrderDateKey and DueDateKey respectively.
The only way to do this in Analysis Services 2000 was to implement a separate dimension for each role, which results in data replication and management overhead such as extra processing time etc. In fact in Analysis Services 2005, each role actually appears as a separate dimension to the end user. But in terms of storage, maintenance and design, there is significantly less overhead.
With a reference dimension, the dimension table is indirectly related to the fact table through another dimension table. This is similar to virtual dimensions in Analysis Services 2000.
This is shown by the following example. Customer is a regular dimension and Geography is a reference dimension.
One option would be to have a CustomerGeography dimension that covers both of the dimension tables in the snowflake schema. CustomerGeography would contain both customer and geography attributes.
The case where we would want Geography as a separate reference dimension is if we want to reuse it against another dimension (or fact) such as Warehouse. This would reduce storage and processing.
Note that we do not get Autoexists behaviour across the two dimensions and we cannot have user hierarchies that span both the reference and the referenced dimensions.
Reference dimensions in Analysis Services 2005 can be materialized whereas virtual dimensions in 2000 were calculated at runtime.
In Analysis Services 2005, we can have a single fact related to multiple dimension members. This is represented by a many-to-many dimension. Some use cases for many-to-many dimensions include the following.
- A bank customer can have multiple accounts while an account can have joint owners. When querying balance by customer, this has to be taken into consideration. As a side note, this example would require that the balance measure be semi-additive.
- A demographic can contain many customers and a customer can be in many demographics.
Both the above cases require a many-to-many mapping table between 2 dimension tables. This is the “factless fact table”; a term coined by Ralph Kimball. The example in the AdventureWorks sample database is a bit simpler. Goods are bought over the Internet and customers can select multiple reasons that prompted the purchase (e.g. advertising, promotion, word of mouth, etc.). In this case, we only require a many-to-many mapping table between the sales fact table and the dimension table (instead of between 2 dim tables).
This is illustrated by the following screenshot. FactInternetSalesReason is considered a fact table by Analysis Services 2005, so it implies a measure group with no measures. In relational database terminology, it is nothing more than a regular many-to-many mapping table.
The following screenshot shows how the dimension is set up in the Dimension Usage tab and where the intermediate measure group is specified.
An important thing to bear in mind is that many-to-many dimensions do not necessarily calculate totals using simple summations. As shown by the following screenshot, the sum of each Sales Reason Type does not equal the Grand Total. This is because facts can be included in the aggregation none, one or many times.
Note that many-to-many dimensions are not pre-aggregated across dimensions (analogous to a materialised reference dimensions) and my understanding is that Microsoft currently has no plans to create a “materialised many-to-many dimension”.
I must say that I am very impressed with Analysis Services 2005.
The IDE using the Visual Studio shell is really a first-class development environment. The disconnected model where nodes in the Solution Explorer are in fact XML files (conforming to XMLA) is very well thought through. Aside from enabling integration with SourceSafe and version management, this results in much better multi-tasking capabilities for BI developers and the IDE utilises good multi-threaded functionality. How many times have you tried to do something in Analysis Manager 2000 only to discover that someone else is processing the cubes and you cannot do anything? The disconnected model overcomes this. When I demonstrated Analysis Services 2005 to one of our customers recently, I started deploying and processing the project and then casually moved onto other tasks in the IDE in parallel. I found it amusing when a member of the audience exclaimed “wow, Analysis Services 2000 doesn’t even allow single users to multi-task, let alone multiple users!”.
I have really only touched the tip of the iceberg in terms of the work that has gone into Analysis Services 2005. Other areas include massive MDX enhancements. The “procedural” model with debugging/browsing capabilities and not worrying about pass, solve order and infinite recursion are great achievements. I can see sophisticated MDX calculations that were pretty-much unworkable being straightforward with 2005.
We have already done much work with Analysis Services 2005 on the Ascend program and it appears that one of our big customers will participate in the Early Adopter Program in the Beta 3 timeframe, which is great news.