Arquivo

Archive for the ‘Cube Processing’ Category

How to stop or cancel your SSAS cube processing

It’s quite easy to do this operation.

First you need to find the following Id’s : SPID, ConnectionID and Session ID. You can get this by using the following DWV’s:

SELECT * FROM $SYSTEM.DISCOVER_CONNECTIONS

This will return all active connections to a particular SSAS Instance Database, along with useful information such as the ConnectionID as well as user, time and data based statistics

SELECT * FROM $SYSTEM.DISCOVER_SESSIONS

This DMV will return all active sessions on a particular SSAS database, this DMV can be linked to the connections DMV using the SESSION_CONNECTION_ID column.

SELECT * FROM $SYSTEM.DISCOVER_COMMANDS (Just to see the Currently executing commands, this can be linked to the Sessions DMV using the SPID column.)
After finding the XMLA Process Command you can get the “ID’s” and run the following cancel statement:

<Cancel xmlns=http://schemas.microsoft.com/analysisservices/2003/engine“>
<ConnectionID>insert the connection id</ConnectionID>
<SessionID>insert the session id</SessionID>
<SPID>insert the spid</SPID>
<CancelAssociated>true</CancelAssociated>
</Cancel>

Thanks to http://thinknook.com/kill-a-session-spid-or-connection-to-analysis-service-cube-2012-09-21/

Anúncios
Categorias:Cube Processing

Dynamic Cube Partitioning in SSAS 2008

Dynamic Cube Partitioning in SSAS 2008

In this article I am going to explain the Implementaion of Dynamic Cube Partition in SSAS 2008. I would want to split this article in two parts as mentioned below:

  1. Partition benefits
  2. Implementing Dynamic Partitions

Partition Benefits:

  • Better Query Performance: Cube partition is a powerful mechanism for improving query performance. Queries that summarize data over 10 years could take considerably longer than those that only search through the current year data. If we have proper partitions then SSAS only has to scan a small subset of data to return query results hence dramatic performance improvements compared to queries running against a cube with a single partition.
  • Minimize downtime:  Cube partitioning supports reducing downtime associated with cube processing. In almost all the cases, a portion of data warehouse is volatile and needs to be processed often. However other portions are relatively static. For example, in a sales cube, we need to change the current year’s data nightly, but sales from previous years might change only occasionally – in case if account for merchandise returns and exchanges. If your warehouse tracks last 10 years salesthen processing only the current partition may be 10 times quicker than processing the entire cube.
  • Aggregations benefits: The partition queried frequently could benefit from additional aggregations, which in turn could improve performance. Partition(s) that are used less can be processed less frequently with considerably fewer aggregations.
  • Customized storage and processing settings: Frequently accessed partitions might benefit from proactive caching and ROLAP storage. On the other hand, other forms of storage and processing might be better for less frequently queried partitions.
  • Distributed query and processing load: SSAS allows you to create remote partitions – a remote partition resides on a server different from its parent cube. This way the queries that affect the remote partition are processed on a server separate from its parent cube, allowing you to take advantage of additional processing power.
  • Parallel Partitions Processing: SSAS allows processing multiple partitions in parallel on a server that has multiple processors. This can further reduce the total cube processing time.
 Implementing Dynamic Partitions

In this section I will explain step by step operations to create Dynamic Cube Partitions.

STEP1:
Preapare test data for Sales cube. I will use three dimensions (DimDate, DimProduct, and DimCustomer) and one fact table for Sales cube. Clickhere to download CubePartitionData.sql file which contains T-SQL code to generate these tables along with sample data.
STEP2:
Create new Analysis Services Project using BIDS and save it as Sales.
STEP3:
Create Data Source, and then Data Souce View using above three dimension tables.

If you are newbie, click Creating First Cube in SSAS 2008 for more information.

STEP4:
Create Sales cube  along with required dimensions. Set all  required properties of dimensions and create any hierarchies if required. My Solution looks like this:

STEP5:
Double click on the cube and navigate to Partition tab. You will see a default partition as shown below:

Delete this default partition and Click on new partition… to click new partition. In Partition Wizard, Select FactSales as available table in Specify Source Information page and click next. Select Specify a query to restict row in Restrict Rows page and write WHERE condition to restrict partition rows.

Click Next twice and enter Partition name Sales_20100101-20100131 in the Completing the Wizard page and click Finish.
Finally Process the cube. Once cube is processed successfully, you can see a new database in Analysis Services:

You can browse the data for this partition:

STEP6:
We are done with one manual partition. Now its time to create the partitions dynamically and apply a logic to process the delta data every time.

First off all, right click on the Partition Sales_20100101-20100131 (highlighted above)  and then select Script Partition as ==> Create To ==> New Query Editor Window. It will create XMLA scripts for this partition. Save this file in the project withCreatePartitionSales_20100101-20100131.xmla name.

In similar fashion, right click on the Partition Sales_20100101-20100131 and then selectProcess. This will open Process Partition wizard, click Script to generate XMLA script and save this file in the project with CreatePartitionSales_20100101-20100131.xmla name.

STEP7:

We are done with SSAS development! Now we have to create an SSIS package to implement dynamic creation of cube partition and processing of that partition.

Click on File => Add => New Project in menu bar in the same solution. Select Integration Services Project and enter the name of the SSIS Project.
Rename the SSIS package1.dtsx by DynamicSalesPartition.dtsx.

STEP8:
Add two connection Managers – one OLE DB connection manager for database and another Analysis Services connection manager for SSAS database.

STEP9:
Add following Package variables:

Here is the brief description of each variable:

Directory – To store package path
FromDateKey – To Store current partition start date key e.g. 20101201.
ToDateKey – To Store current partition end date e.g. 20101231
PartitionName – Current partition name e.g. SalesPartition_20101201-20101231
IsPartitionExists – To check whether partition alreday exists or not
CreatePartitionXMLA – To store XMLA script to create current partition
ProcessPartitionXMLA– To store XMLA script to process current partition

Note: You can add more variables as per need and business requirement.
STEP10
Now you have to develop SSIS package as shown below:
First Execute SQL Task is used to initialize FromDateKey, ToDateKey, and PartitionName for the current partition. Use below query in this Task:
DECLARE
@FromDateKey varchar(8), @ToDateKey varchar(8),
@CalendarYear int, @CalendarMonth int

SELECT
@FromDateKey = MAX(FromDateKey),
@ToDateKey = MAX(ToDateKey)
FROM PartitionLog (NOLOCK)

SELECT
@CalendarYear = MIN(CalendarYear),
@CalendarMonth = MIN(CalendarMonth)
FROM DimDate WHERE DateKey > @ToDateKey + 1

–SET FromDateKey and ToDateKey for current Partition
IF NOT CONVERT(varchar(8),GETDATE(),112) BETWEEN @FromDateKey AND @ToDateKey
SELECT @FromDateKey = MIN(DateKey), @ToDateKey = MAX(DateKey)
FROM DimDate (NOLOCK)
WHERE CalendarYear = @CalendarYear AND CalendarMonth = @CalendarMonth

SELECT @FromDateKey AS FromDateKey, @ToDateKey AS ToDateKey,
‘Sales_’ + @FromDateKey + ‘-‘ + @ToDateKey AS PartitionName
GO

Second Task in the package is Script Task. Here we need to read the contents of files
CreatePartitionSales_20100101-20100131.xmla and
ProcessPartitionSales_20100101-20100131.xmla and store in the package variables CreatePartitionXMLA & ProcessPartitionXMLA.

I am using User::Directory as ReadOnlyVariable and User::CreatePartitionXMLA,User::ProcessPartitionXMLA as ReadWriteVariables.

Click here to download the code for Script Task – Read XMLA Files and store in variables.

To know about reading file contents using Script Task, click Script Task to Read File.

Third Task is a sequence container which contains three task – Script Task, Analysis Services Execute DDL Task, and Execute SQL Task.

Script Task is used for two reasons –
1. To check whether current partition exists or not.
2. To update XMLA content for current Partition.

This is the most tricky and interseting part here. We need to replace Partition ID, Partition Name, and Source QueryDefinition for current Partition which are highlighted below:

In Script Task, select User::FromDateKey,User::PartitionName,User::ToDateKey as ReadOnlyVariables andUser::CreatePartitionXMLA,User::ProcessPartitionXMLA,User::IsPartitionExists as ReadWriteVariables.

Click here to download the code for Script Task – Generate XMLA to create and Process new Partition.

Click SSIS Script Task to Find and Replace File contents to know more about find and replace mechanism using script task.

Second task in the sequence Container is Analysis Services Execute DDL Task. This task is used to create new Partition. In Analysis Services Execute DDL Task Editor, selectSSAS.localhost as Connection, Variable as SourceType, and User::CreatePartitionXMLAas Source.

Also don’t forget to set precedence constraint. You should execute this task only when partition does not exists. Open Precedence Constraint Editor and Select Expression and Constraint in Evolution operation, Success in Value, and @IsPartitionExists == 0 in Expression as shown below:

Third Task in the Sequence Container is Execute SQL Task – this is used to store newly created Partition information in the PartitionLog table. In Execute SQL Task, select ResultSetNone, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:

INSERT INTO dbo.PartitionLog
([PartitionName],[FromDateKey],[ToDateKey],[CreatedDate])
VALUES (?,?,?,GETDATE())

Click on Parameter Mapping and map the parameters as shown below:

STEP11
Finally we need to process the current partition. This is independent of previous sequence container where we are creating new partition if it is not exists.
However, we need to process the current partition every time we execute the package. In most of the cases, new partition will be created on the first day of month but partition should be processed everyday to include delta data.

This sequence container also contains three tasks.

First task is Analysis Services Processing Task – to process the dimensions data. SelectSSAS.localhost as connection manager and all the dimensions in Object Name – which are not static in data. Select Process Update as Process Option.

Second Task is againg Analysis Serives Processing Task – to process Current Partition. Select SSAS.localhost as connection manager and Sales_20100101-20100131 in Object Name – which is default partition (or first partition). Select Process Full as Process Option.

Now click on Expression and select @[User::ProcessPartitionXMLA] in ProcessingCommands.

Finally store the Last Processed Date and Processed Count (number of time this partition has processed) in the log table using Execute SQL task. Select ResultSet None, ConnectionType OLE DB, Connection (local).Test, SQLSourceType Direct input and SQLStatement as:

UPDATE PartitionLog
SET LastProcessDate = GETDATE()
,ProcessedCount = ProcessedCount + 1
WHERE PartitionName = ?

Click on Parameter Mapping and map the parameters and map the parameter 0 withUser::PartitionName.

STEP12
We are done with package development. Now its time to execute the package and check the dynamic behaviour of Partition creation and processing. Before execution the package, there is only partition in the SSAS Sale database.
Now Right Click on the package and execute it. Once package is execute successfully, you will see one more partition in the Sales database and one more entry in PartitionLog table.

Now Execute the package or schedule it to execute automatically. Once you are done with 10 more executions, you will reach to current month partition Sales_20101201-20101231. and you can see 12 partitions – one partition for each month as shown below:
Since you are in current month partition, if you execute the package every day in current month – it will not create any new partition till beginning of next month. It will only process Current Partition to include the data in the cube as shown below:
Now you can browse the data for all the partitions:
We are done with Dynamic Package creation. I hope you will like this article and if your answer is yes then don’t forget to click Cool 🙂
Cheers!!!
Categorias:Partitions

Performance – Case Study

Link:http://www.msbiconcepts.com/

We have discussed about incremental cube processing in earlier post. I am placing same flow diagram again because this flow diagram will be referenced in my current post. For one of my project, we have reduced processing time drastically by just understanding dependencies among SSAS objects and grouping them properly for parallel processing.
Fig.1 Flow diagram for cube processing
Incremental processing can be done in three steps mentioned in above flow diagram. Most of the SSAS developer tries to implement same approach by putting SSAS objects in serial order which increases cubes processing time. We can reduce processing time drastically by understanding dependencies among SSAS objects and divide them into different smaller group which can be run in parallel.
Let’s understand the scenario. Initially cube processing was taking approximately 12 hours for finishing the job in serial manner.
Fig.2 SSAS Objects in Serial Manner
We know that cube’s partitions can be processed only after dimension processing but we can process dimensions simultaneously as they are not dependent on each other. Same way, cubes are independent to each other. We can create two groups, one for dimensions processing and another for cubes processing. By doing this manner, processing time will reduce to approximately 6 hours(50% of total cubes processing).

 

Fig3. SSAS Objects in Parallel Manner

Now each cube will be available with new data after 6 hours. Now if you check fig.3 closely,  you will find customer dimension is exceptionally taking higher processing time. Customer dimension has more than 100M rows so processing of customer dimension takes around 3 hours but this dimension is used only in one of the cube(cube1) then you can reduce cube processing time for cube2 and cube 21. You will need to regroup SSAS objects. Here is flowchart after re- arrangement:

Fig4. SSAS Objects in Parallel Manner(Special scenario)

However cube 1 will be available after 6 hours but cube 2 and cube 21 will be available early in approximately 3 hours.

Cube 21 up time(processing time) has reduced to 25% and cube 2 up time (processing time) reduced to 33% which is significant improvement. If there are 2-3 dimension similar to Dim Customer then running them parallel will improve time for cube 1 also.

This dependency mapping exercise help me a lot in reducing overall processing time. There are methods to reduce processing time for SSAS objects which will be taken separately in my future posts. Please share your views/suggestions by putting comment or sending email

Boost Cube Processing Time – Unknown Members

Link:http://cwebbbi.wordpress.com/category/analysis-services/

I’ve spent the last week working with a fairly large cube – several fact tables have hundreds of millions of rows in – and, as always on an SSAS project, there are occasions when you have no choice but to reprocess the whole cube. Sitting round waiting for processing to complete can be pretty tedious and, while I’m sure we’ve all worked out how to plan our work so that processing can take place overnight, during lunch or a meeting, it’s always good to know some tricks to make processing faster.

There’s a good set of tips on how to optimise processing in the SSAS Operations Guide, but this post is about something not mentioned in that white paper because it’s a scenario that shouldn’t happen in production – although it’s very common in development. In my case I’ve been building a cube on top of a half-finished data warehouse where the data is still quite dirty. That means that during processing there are lots of key errors, and while they get fixed I’m handling them in my processing by selecting the ‘Convert to Unknown’ option. What I’ve half-known for a long time and only properly investigated this week, though, is the impact that reporting these key errors can have on processing times.

Here’s an extreme example using Adventure Works. I’ve created a new cube based on the FactInternetSales table and added the Customer dimension to it, but deliberately joined the Customer dimension on to the Order Date column in FactInternetSales to simulate a lot of key errors.

image

If I then do a ProcessFull with the following error configuration:

image

ie with Key Error Action set to Convert To Unknown, Ignore Errors Count selected, and Key Not Found and Null Key Not Allowed both set to Report And Continue, when I do a full process all the errors are logged in the UI and it takes around 23 seconds to finish:

image

If, on the other hand, you set Key Not Found and Null Key Not Allowed to Ignore Error, as follows:

image

…a full process takes under a second:

image

Now this is an extreme example, of course, and in a production system you should not have any key errors anyway, and if you do you definitely do not want to ignore these key errors. But if you’re developing a cube and you know that key errors exist but you don’t care about them (for the moment), you can save a massive amount of time by not reporting these errors. In my case cube processing went down from around 3 hours to 45 minutes. Which is a good thing if you want to get on with your work faster, but I suppose is a bad thing if it interrupts your web browsing/coffee drinking/chatting or whatever you do to pass the time while processing’s taking plac

Categorias:Cube Processing