SQL Server Data Transformation Services (DTS) Best Practices

Contents: Some  very good best practises on using DTS in SQL Server 2000 and 7.0 (2005: SSIS!)

DTS Package Naming Conventions:

<Module>: <Package Function>: <Optional Part Number>
For example:
Coupon Allocation: Process Coupons
Coupon Allocation: Allocate Coupons
Coupon Allocation: Publish Coupons: Part 1
Coupon Allocation: Publish Coupons: Part 2

This makes it a lot easier locating DTS packages in SQL server (remember these are stored centrally for all databases.

 Troubleshooting DTS Packages:

  • To configure logging for your package, right click on the package in DTS Designer, select ‘Package Properties’. Go to ‘Logging’ tab. In this tab, specify the error file, as well as the SQL Server to which you want to log the package execution. When logging to SQL Server, the information will be logged into sysdtspackagelog and sysdtssteplog tables in the msdb database. 

  • Logging to tables is especially useful, if you want to generate statistics like ‘on an average, how much time each step is taking to complete, in your critical DTS package’. 

DTS is a client side application!

DTS is a client side application. DTS package when scheduled using SQL Agent as a job, it runs on the server. When you run the package interactively, then the package runs on the local machine. That is, you connect to a remote SQL Server using Enterprise Manager on workstation, the DTS packages you run will run on your workstation. This will result in slower package execution as the data flows over the network, between your local machine and the server.

Performance of DTS Packages:

  • For importing text files into SQL Server tables, use the built-in ‘Bulk Insert Task’, instead of the usual ‘Transform Data Task’. However – not that  you cannot perform data transformations with Bulk Insert task.

  • Parallelize your tasks to improve performance of your DTS Package. That is, if you have two tasks that are independent of each other, then run them in parallel, instead of running them serially (Make sure all the steps involved in the parallel processing are configured to use their own connections).

  • Similarly, if you have a module that makes use of multiple DTS packages, identify those packages, that can run in parallel, independent of each other. Using the ‘Execute Package’ task, you can even pass data from one package to another using inner package global variables and outer package global variables.

  • When using ‘Transform Data Task’ to transfer tables from source to destination, try to use a single transformation for all columns, instead of using an individual transformation for each column. I personally tested this and found the single transformation to be much faster:

Transform Data Task with Multiple TransformationsTransform Data Task with Single Transformation

  • To achieve high speed data loads, consider removing indexes, constraints, triggers on the target table. You could temporarily remove/disable these objects and re-enable them after the data load.

  • I’ve seen developers processing the text file line by line, using ActiveX scripting. This is a very bad thing to do.  A better approach would be to create a table in SQL Server, and load the complete file into this table quickly, using ‘Bulk Insert’ task. Now this table will serve as a staging table. Once the data is in the table, just use simple SELECT queries to extract different rows into different tables. If you have to use code programs written in languages like C perform file I/O operations much faster, compared to ActiveX scripting.

  • ‘Data Driven Query Task’ (DDQ) is not always a solution for processing/applying changes between two tables. Try to stay away from DDQs, if you can, as they tend to be very slow, because they handle one row at a time.

Log File Size problems during DTS

  • It is common to end up with huge transaction log growth (possible out of log space errors) when performing huge update or delete operations from ‘Execute SQL Task’. To overcome this problem, consider batched updates or deletes. That is, instead of updating all the rows at once, update chunks of rows, and clear the transaction log after processing each chunk. Batched operations also help minimize locking.

Side-note – always use ISNULL when comparing values in nullable columns!

If you have NULLable columns, you must use COALESCE or ISNULL function to convert the NULL to some comparable value, before you can compare a NULL and non-NULL value.

Portability of DTS Packages:

  • One of the most common methods is to store the DTS package as a structured storage file (.DTS file), move the file to the target server, open it from Enterprise Manager and save it to that SQL Server. Other methods include copying the contents of msdb..sysdtspackages from one server to another. You could also use the DTS object model to load a package and save it to a different SQL Server.

  • The built-in ‘Dynamic Properties’ task lets you store all the variable parameters (like file names, SQL Server names, FTP locations etc.) separately, and assigns them to your package tasks at runtime. When you move the package to a new environment, you only have to update the ‘Dynamic Properties Task’ to suit the new environment, without having to touch any of the other tasks.

  • UDLs (Universal Data Link) are similar to DSNs, in that they store the information required to connect to a data source, except that UDLs are used to connect to OLEDB data sources. Creating a UDL is easy. Just create a blank file with .UDL extension and then double click it to set the connection properties. UDLs can be used by many of the built-in DTS tasks for their connection settings. A major advantage of having UDLs is that, when you want a set of DTS tasks to point to a new server, you simply edit the UDL file.

  • Similarly, Global Variables can also be used to maintain and share a piece of information between different steps of a package and between different packages. When using Global Variables, check the ‘Explicit Global Variables’ check box in the ‘Global Variables’ tab of the ‘Package Properties’ dialog box. This makes sure that all the global variables are well defined, avoiding bugs and problems with your code.

Maintenance and Optimization

Every time you save a DTS package to SQL Server, a new version is created. So, you will see multiple rows for the same DTS package in the msdb..sysdtspackages table, one row for each version. In case of an old DTS package that went through considerable development, you will see several versions in sysdtspackages. If you don’t really need all those older versions of the package, then delete them, as it saves space in the msdb database. From within Enterprise Manager , you could right click on a package, and select ‘Versions’ to see all versions and delete them selectively.

Turn on DTS caching, to prevent the DTS Designer and DTS wizard from enumerating the OLE-DB providers, Active/X scripting engines, DTS Tasks and DTS Transforms, every time from the registry. This will improve the package load times. DTS caching is turned off by default. The following procedure turns on the caching: In Enterprise Manager, right click on the ‘Data Transformation Services’ node, and select ‘Properties’ to bring up the ‘Package Properties’ dialog box. Check the box against ‘Turn on cache’ to enable caching. You could periodically refresh the cache by clicking on the ‘Refresh Cache’ button.

Special attention needs to be paid, when it comes to handling files from DTS packages:

  • Do not fail on missing data files: Packages expect some files to be present at a particular path and some of those files happen to come from different sources (including third parties). If it is acceptable in your environment, do not let the DTS package fail with missing files. It’s not nice to page a DBA at mid-night, when you can live with it…isn’t it? Simply check the file existence from your package using an ‘ActiveX Script Task’ and continue as usual if the file exists, otherwise simply disable the rest of the steps. Code samples for this can be found at sqldts.com

  • Keep a backup of processed files: Imagine a scenario where you get a product catalog from a third party product supplier. You pull the file from an FTP location, process it into your tables, and delete the downloaded files. Now you realize that some of the products are showing up with a price of 0 and customers are happily purchasing those products. You think you received incorrect price information from the supplier, but the supplier thinks there is a bug in your catalog processing code. But you can’t prove your point, because your DTS package deleted the files at the end of the process. To avoid such problems, design a mechanism where processed files get moved to a specific location, and they get deleted from that location if they are a week old (or month etc.).
  • FTP is not reliable: Yes, FTP is not reliable. You may not get any errors, but the file you downloaded could be incomplete. If reliable data is critical for your environment, then consider implementing some kind of cheksum mechanism, that verifies that complete file is downloaded. Some people also add a trailer bit at the end of the file.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: