ETL | Road to BI

Arquivo

Archive for the ‘ETL’ Category

Using a Script Component Source to Consume a WebService

2013/07/14 jpedroalmeida Deixe um comentário

Thanks to Dennis and Jim for this original post at: http://dennysjymbo.blogspot.pt/2013/05/using-script-component-source-to.html

When attempting to use XML as a data source in SSIS you really only have 2 solution (and this depends on whether or not this XML is coming from a web service). If your XML feed is coming from a web service, you can use the Web Service Task and set the output to either a file or a variable. If you set your file to a variable, or file, you can use the XML Task to perform various xpath queries, reformat using an XSLT, etc. The result of which can be set to other variables or a file. If set to a file, it can then be sourced using an XML Source and brought into your data flow. But what if you don’t want to use an XML file as your source? SSIS has no way of reading an XML feed from a web service directly into a data flow without custom coding. In this entry I’ll show an example of how this can be done using a Script Component source in a data flow.

This solution is very similar to the JSON solution I wrote about, with a few changes to apply to XML. The XML source we’re using is going to apply to metrics used in a call center. These attributes include employee login, date, calls offered, calls answered and calls missed. Our XML source will look something like this:

<?xml version=”1.0″ encoding=”utf-8″?>

</DailyStat>

</DailyStat>

</DailyStat>

</DailyStat>

</DailyStat>

</DisplayStats>

This comes from an ASP.NET web service I built using SOAP:

You can see from this screenshot that the web service takes one parameter, offerDate, with a date format. We're going to want to make this a variable that we can alter at run time every time this package is executed. This can be done through values returned from an Execute Sql Task from a database, set via a function, etc. Regardless of how its implemented, in this example were going to hard code the variable to 03/28/2013:

Now that we have a variable for offerDate, let’s set up our script component. Drag a script component onto the design surface of a data flow. When prompted, select source as the script component type:

On the script screen, we want to make sure to add our SSIS variable in ReadOnlyVariables:

Next, let’s configure the output buffer for this script component. On the input and output screen, click on the default output and add some columns. These will represent the XML elements were going to get back from our web service:

This screen helps us to define what our output is going to look like. The columns that will represent our XML elements are as follows: WEBLOGIN DT_STR 50, DATEDT_DBTIMESTAMP, CALLSOFFERED DT_I4, CALLSANSWERED DT_I4 and CALLSMISSEDDT_I4. With this all set up, we can go back to the script screen, click on the Edit Script button and begin coding. In the main.cs file, paste the following code:

using System;

using System.Data;

using Microsoft.SqlServer.Dts.Pipeline.Wrapper;

using Microsoft.SqlServer.Dts.Runtime.Wrapper;

using System.Net;

using Microsoft.SqlServer.Dts.Runtime;

using System.Windows.Forms;

using System.IO;

using System.Xml.Serialization;

[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]

public class ScriptMain : UserComponent

{

public override void CreateNewOutputRows()

{

//Get SSIS Variables

string offerDate = Variables.OFFERDATE;

//Set Webservice URL

string wUrl = “http://YOUR WEB SERVICE URI/Respiratory/ResService.asmx/getAgentMetricsXml?offerDate=”+offerDate;

try

{

//Call getWebServiceResult to return our DailyStat array

DailyStats outPutStats = getWebServiceResult(wUrl);

//For each group of metrics output records

foreach (var stats in outPutStats.MDailyStat)

{

Output0Buffer.AddRow();

Output0Buffer.WEBLOGIN = stats.WebLogin;

Output0Buffer.DATE =stats.Date;

Output0Buffer.CALLSANSWERED =stats.CallsAnswered;

Output0Buffer.CALLSOFFERED = stats.CallsOffered;

Output0Buffer.CALLSMISSED = stats.CallsMissed;

}

catch (Exception e)

{

failComponent(e.ToString());

}

//Method to return our DailyStat array

private DailyStats getWebServiceResult(string wUrl)

{

HttpWebRequest httpWReq = (HttpWebRequest)WebRequest.Create(wUrl);

HttpWebResponse httpWResp = (HttpWebResponse)httpWReq.GetResponse();

DailyStats xmlResponse = null;

try

{

//Test the connection

if (httpWResp.StatusCode == HttpStatusCode.OK)

{

Stream responseStream = httpWResp.GetResponseStream();

//Set xmlString using a stream reader

using (StreamReader reader = new StreamReader(responseStream))

{

//Deserialize our XML

XmlSerializer sr = new XmlSerializer(typeof(DailyStats));

xmlResponse = (DailyStats)sr.Deserialize(reader);

reader.Close();

}

//Output connection error message

else

{

failComponent(httpWResp.StatusCode.ToString());

}

//Output xml parsing error

catch (Exception e)

{

failComponent(e.ToString());

}

return xmlResponse;

}

//Outputs error message

private void failComponent(string errorMsg)

{

bool fail = false;

IDTSComponentMetaData100 compMetadata = this.ComponentMetaData;

compMetadata.FireError(1, “Error Getting Data From Webservice!”, errorMsg, “”, 0, out fail);

}

//Class to hold elements within <DailyStat>

public class DailyStat

{

public string WebLogin { get; set; }

public DateTime Date { get; set; }

public int CallsOffered { get; set; }

public int CallsAnswered { get; set; }

public int CallsMissed { get; set; }

}

//Class to hold our array of <DailyStat>

[XmlRootAttribute(“DisplayStats”)]

public class DailyStats

{

[XmlElement(“DailyStat”)]

public DailyStat[] MDailyStat { get; set; }

}

After calling our web service and getting our stream of XML, we deserialize into a DisplayStats object:

XmlSerializer sr = new XmlSerializer(typeof(DisplayStats));

xmlResponse = (DisplayStats)sr.Deserialize(reader);

This object is returned to the calling method. We then loop through the DailyStat[]array, within the DisplayStats object, and output the attributes to the output buffer:

//Call getWebServiceResult to return our DailyStat array

DisplayStats outPutStats = getWebServiceResult(wUrl);

//For each group of metrics output records

foreach (var stats in outPutStats.MDailyStat)

{

Output0Buffer.AddRow();

Output0Buffer.WEBLOGIN = stats.WebLogin;

Output0Buffer.DATE =stats.Date;

Output0Buffer.CALLSANSWERED =stats.CallsAnswered;

Output0Buffer.CALLSOFFERED = stats.CallsOffered;

Output0Buffer.CALLSMISSED = stats.CallsMissed;

}

Build, save and lets return to the data flow. We can create a union all that the script component can output to so we can add a data viewer to see the records. In a normal implementation this would go to a data destination:

Let’s now run the package and view the results:

The XML data is now in our data flow and can be sent to a data destination. No physical XML file was needed, which means we didn’t have to rely on an XML Source to get this.

Categorias:WebServices

Using HASHBYTES() to compare columns

2011/10/20 jpedroalmeida Deixe um comentário

Recently, while at a client engagement, I was building some SSIS packages an issue came up where they didn?t want to use the CHECKSUM() function in TSQL to do column comparisons because the results can be inaccurate on some rare occasions. I personally have never come across this but others here at Pragmatic Works have. So we have two options freely available to work around this issue. The first is the third party component plugin that you can get free at codeplex called Multiple Hash. The client wasn?t comfortable with having to install this component on multiple servers throughout the environment so that option wasn?t available to me. Instead I had to use the HASHBYTES() function in TSQL.

HASHBYTES() is much more reliable than checksum when it comes to producing accurate results, but it comes at a slight cost.

The first thing to note is how to construct the HASHBYTES() function. In the first part you tell the function which algorithm you are going to use. I?m using SHA1, but be aware that they single tics ? ? are required followed by a comma. Then you must concatenate the columns you wish to use together as seen below.

There you can see its already a bit more arduous than using CHECKSUM(), but not that big of a deal to concatenate a bunch of columns. Lets look at the results.

Uh-Oh here is our first problem. HASHBYTES() doesn?t work with NULL values inside any columns. So we?ve got to handle the Nulls in our query using the ISNULL() function.

Now the results look like this:

Then next thing that you have to look at is how HASHBYTES() handles(or more accurately doesn?t handle) Data types. Here the ID column is an INT data type, but the same holds true for any non-string data type.

We get an error saying that the data type is wrong for the HASHBYTES() function

So now we have to CAST every column that is a non-string data type.

Now after this fix our results look better.

So as you can see already there will be a good deal more T-SQL coding involved with using HASHBYTES then with CHECKSUM(). But this isn?t all. The last little gotcha isn?t quite as obvious as the first two. Lets go back to our Null handling query.

I?ve gone and edited the data some for this example. Please also note that HASHBYTES() is case sensitive meaning that if you have the same spelling but different casing at an individual character level the hash value returned will be different.

The rows are different from one another but when concatenated together for the HASBYTES() function they produce the same exact value. So to handle this we have to update our code again. We are going to add a rarely used character to the concatenation so ensure that they results will return correctly.

Here I chose to use a pipe to basically delimit the columns thereby making them different from each other. The result is much better.

If we take a quick look at results of the two concatenations we can see why we get the different results

The delimited column is obviously different from one row to the next, and the Non-Delimited column is exactly the same for each row.

While HASHBYTES() is definitely more accurate and reliable than using CHECKSUM(), there are some hoops to jump through to insure you get the results you need. I hope this helps you guys out.

Categorias:Incremental Update

Road to BI

Arquivo

Using a Script Component Source to Consume a WebService

Using HASHBYTES() to compare columns

Categories

Subscrever Blog via email