Dynamic Cube Partitioning in SSAS 2008
Dynamic Cube Partitioning in SSAS 2008
- Partition benefits
- Implementing Dynamic Partitions
- Better Query Performance: Cube partition is a powerful mechanism for improving query performance. Queries that summarize data over 10 years could take considerably longer than those that only search through the current year data. If we have proper partitions then SSAS only has to scan a small subset of data to return query results hence dramatic performance improvements compared to queries running against a cube with a single partition.
- Minimize downtime: Cube partitioning supports reducing downtime associated with cube processing. In almost all the cases, a portion of data warehouse is volatile and needs to be processed often. However other portions are relatively static. For example, in a sales cube, we need to change the current year’s data nightly, but sales from previous years might change only occasionally – in case if account for merchandise returns and exchanges. If your warehouse tracks last 10 years salesthen processing only the current partition may be 10 times quicker than processing the entire cube.
- Aggregations benefits: The partition queried frequently could benefit from additional aggregations, which in turn could improve performance. Partition(s) that are used less can be processed less frequently with considerably fewer aggregations.
- Customized storage and processing settings: Frequently accessed partitions might benefit from proactive caching and ROLAP storage. On the other hand, other forms of storage and processing might be better for less frequently queried partitions.
- Distributed query and processing load: SSAS allows you to create remote partitions – a remote partition resides on a server different from its parent cube. This way the queries that affect the remote partition are processed on a server separate from its parent cube, allowing you to take advantage of additional processing power.
- Parallel Partitions Processing: SSAS allows processing multiple partitions in parallel on a server that has multiple processors. This can further reduce the total cube processing time.
In this section I will explain step by step operations to create Dynamic Cube Partitions.
Preapare test data for Sales cube. I will use three dimensions (DimDate, DimProduct, and DimCustomer) and one fact table for Sales cube. Clickhere to download CubePartitionData.sql file which contains T-SQL code to generate these tables along with sample data.
Create new Analysis Services Project using BIDS and save it as Sales.
Create Sales cube along with required dimensions. Set all required properties of dimensions and create any hierarchies if required. My Solution looks like this:
Double click on the cube and navigate to Partition tab. You will see a default partition as shown below:
Delete this default partition and Click on new partition… to click new partition. In Partition Wizard, Select FactSales as available table in Specify Source Information page and click next. Select Specify a query to restict row in Restrict Rows page and write WHERE condition to restrict partition rows.
Finally Process the cube. Once cube is processed successfully, you can see a new database in Analysis Services:
We are done with one manual partition. Now its time to create the partitions dynamically and apply a logic to process the delta data every time.
First off all, right click on the Partition Sales_20100101-20100131 (highlighted above) and then select Script Partition as ==> Create To ==> New Query Editor Window. It will create XMLA scripts for this partition. Save this file in the project withCreatePartitionSales_20100101-20100131.xmla name.
In similar fashion, right click on the Partition Sales_20100101-20100131 and then selectProcess. This will open Process Partition wizard, click Script to generate XMLA script and save this file in the project with CreatePartitionSales_20100101-20100131.xmla name.
Click on File => Add => New Project in menu bar in the same solution. Select Integration Services Project and enter the name of the SSIS Project.
Rename the SSIS package1.dtsx by DynamicSalesPartition.dtsx.
Add two connection Managers – one OLE DB connection manager for database and another Analysis Services connection manager for SSAS database.
Add following Package variables:
Directory – To store package path
FromDateKey – To Store current partition start date key e.g. 20101201.
ToDateKey – To Store current partition end date e.g. 20101231
PartitionName – Current partition name e.g. SalesPartition_20101201-20101231
IsPartitionExists – To check whether partition alreday exists or not
CreatePartitionXMLA – To store XMLA script to create current partition
ProcessPartitionXMLA– To store XMLA script to process current partition
@FromDateKey varchar(8), @ToDateKey varchar(8),
@CalendarYear int, @CalendarMonth int
@FromDateKey = MAX(FromDateKey),
@ToDateKey = MAX(ToDateKey)
FROM PartitionLog (NOLOCK)
@CalendarYear = MIN(CalendarYear),
@CalendarMonth = MIN(CalendarMonth)
FROM DimDate WHERE DateKey > @ToDateKey + 1
–SET FromDateKey and ToDateKey for current Partition
IF NOT CONVERT(varchar(8),GETDATE(),112) BETWEEN @FromDateKey AND @ToDateKey
SELECT @FromDateKey = MIN(DateKey), @ToDateKey = MAX(DateKey)
FROM DimDate (NOLOCK)
WHERE CalendarYear = @CalendarYear AND CalendarMonth = @CalendarMonth
SELECT @FromDateKey AS FromDateKey, @ToDateKey AS ToDateKey,
‘Sales_’ + @FromDateKey + ‘-‘ + @ToDateKey AS PartitionName
ProcessPartitionSales_20100101-20100131.xmla and store in the package variables CreatePartitionXMLA & ProcessPartitionXMLA.
I am using User::Directory as ReadOnlyVariable and User::CreatePartitionXMLA,User::ProcessPartitionXMLA as ReadWriteVariables.
Click here to download the code for Script Task – Read XMLA Files and store in variables.
To know about reading file contents using Script Task, click Script Task to Read File.
Third Task is a sequence container which contains three task – Script Task, Analysis Services Execute DDL Task, and Execute SQL Task.
Script Task is used for two reasons –
1. To check whether current partition exists or not.
2. To update XMLA content for current Partition.
This is the most tricky and interseting part here. We need to replace Partition ID, Partition Name, and Source QueryDefinition for current Partition which are highlighted below:
In Script Task, select User::FromDateKey,User::PartitionName,User::ToDateKey as ReadOnlyVariables andUser::CreatePartitionXMLA,User::ProcessPartitionXMLA,User::IsPartitionExists as ReadWriteVariables.
Click here to download the code for Script Task – Generate XMLA to create and Process new Partition.
Click SSIS Script Task to Find and Replace File contents to know more about find and replace mechanism using script task.
Second task in the sequence Container is Analysis Services Execute DDL Task. This task is used to create new Partition. In Analysis Services Execute DDL Task Editor, selectSSAS.localhost as Connection, Variable as SourceType, and User::CreatePartitionXMLAas Source.
Also don’t forget to set precedence constraint. You should execute this task only when partition does not exists. Open Precedence Constraint Editor and Select Expression and Constraint in Evolution operation, Success in Value, and @IsPartitionExists == 0 in Expression as shown below:
Third Task in the Sequence Container is Execute SQL Task – this is used to store newly created Partition information in the PartitionLog table. In Execute SQL Task, select ResultSetNone, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:
INSERT INTO dbo.PartitionLog
Click on Parameter Mapping and map the parameters as shown below:
Finally we need to process the current partition. This is independent of previous sequence container where we are creating new partition if it is not exists.
However, we need to process the current partition every time we execute the package. In most of the cases, new partition will be created on the first day of month but partition should be processed everyday to include delta data.
This sequence container also contains three tasks.
First task is Analysis Services Processing Task – to process the dimensions data. SelectSSAS.localhost as connection manager and all the dimensions in Object Name – which are not static in data. Select Process Update as Process Option.
Second Task is againg Analysis Serives Processing Task – to process Current Partition. Select SSAS.localhost as connection manager and Sales_20100101-20100131 in Object Name – which is default partition (or first partition). Select Process Full as Process Option.
Now click on Expression and select @[User::ProcessPartitionXMLA] in ProcessingCommands.
Finally store the Last Processed Date and Processed Count (number of time this partition has processed) in the log table using Execute SQL task. Select ResultSet None, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:
SET LastProcessDate = GETDATE()
,ProcessedCount = ProcessedCount + 1
WHERE PartitionName = ?
Click on Parameter Mapping and map the parameters and map the parameter 0 withUser::PartitionName.
We are done with package development. Now its time to execute the package and check the dynamic behaviour of Partition creation and processing. Before execution the package, there is only partition in the SSAS Sale database.
Now Right Click on the package and execute it. Once package is execute successfully, you will see one more partition in the Sales database and one more entry in PartitionLog table.