Skip to content

SSIS: performance counter

November 8, 2011

Integration Services installs a set of performance counters that you can use to monitor the performance of the data flow engine. For example, you can watch the “Buffers spooled” counter to determine whether data buffers are being written to disk temporarily while a package is running. This swapping reduces performance and indicates that the computer has insufficient memory.

Note Note
If you install Integration Services on a computer that is running Windows Server 2003, and then upgrade that computer to Windows Server 2008, the upgrade process removes the Integration Services performance counters from the computer. To restore the Integration Services performance counters on the computer, run SQL Server Setup in repair mode.

The following table describes the performance counters.

 
Performance counter Description
BLOB bytes read The number of bytes of binary large object (BLOB) data that the data flow engine has read from all sources.
BLOB bytes written The number of bytes of BLOB data that the data flow engine has written to all destinations.
BLOB files in use The number of BLOB files that the data flow engine currently is using for spooling.
Buffer memory The amount of memory that is in use. This may include both physical and virtual memory. When this number is larger than the amount of physical memory, the Buffers Spooled count rises as an indication that memory swapping is increasing. Increased memory swapping slows performance of the data flow engine.
Buffers in use The number of buffer objects, of all types, that all data flow components and the data flow engine is currently using.
Buffers spooled The number of buffers currently written to the disk. If the data flow engine runs low on physical memory, buffers not currently used are written to disk and then reloaded when needed.
Flat buffer memory The total amount of memory, in bytes, that all flat buffers use. Flat buffers are blocks of memory that a component uses to store data. A flat buffer is a large block of bytes that is accessed byte by byte.
Flat buffers in use The number of flat buffers that the Data flow engine uses. All flat buffers are private buffers.
Private buffer memory The total amount of memory in use by all private buffers. A buffer is not private if the data flow engine creates it to support data flow. A private buffer is a buffer that a transformation uses for temporary work only. For example, the Aggregation transformation uses private buffers to do its work.
Private buffers in use The number of buffers that transformations use.
Rows read The number of rows that a source produces. The number does not include rows read from reference tables by the Lookup transformation.
Rows written The number of rows offered to a destination. The number does not reflect rows written to the destination data store.

You use the Performance Microsoft Management Console (MMC) snap-in to create a log that captures performance counters.

To add a new log that includes all performance counters for the data flow engine

  1. In Control Panel, click Administrative Tools, and then click Performance.
  2. In the Performance dialog box, expand Performance Logs and Alerts, right-click Counter Logs, and then click New Log Settings. Type the name of the log. For example, type MyLog.
  3. In the MyLog dialog box, click Add Counters.
  4. In the Add Counters dialog box, select SQL Server:SSISPipeline in the Performance object box.
  5. Select the All Counters check box, select Add, and then close the dialog boxes.
    Note Note
    You must start the Performance Logs and Alerts service using a local account or a domain account that is a member of the Administrators group.

SQL Server Integration Services provide a set of performance counters. Among them the following few are helpful when you tune or debug your package:

  • Buffers in use
  • Flat buffers in use
  • Private buffers in use
  • Buffers spooled
  • Rows read
  • Rows written

“Buffers in use”, “Flat buffers in use” and “Private buffers in use” are useful to discover leaks. During package execution time, you will see these counters fluctuating. But once the package finishes execution, their values should return to the same value as what they were before the execution. Otherwise, buffers are leaked. In occasions like that, please contact Microsoft PSS.

“Buffers spooled” has an initial value of 0. When it goes above 0, it indicates that the engine has started memory swapping. In a case like that, please follow my previous blog (“Set BLOBTempStoragePath and BufferTempStoragePath to Fast Drives”) to set Data Flow Task properties BLOBTempStoragePath and BufferTempStoragePath appropriately for maximal I/O bandwidth.

“Rows read” and “Rows written” show how many rows the entire Data Flow has processed. They give you an overall idea about the execution progress.

System Monitor

System Monitor is a tool used to monitor the performance of the server. It gives information about the resources that are under pressure. The values of various counters in System Monitor indicate which resource is under pressure. Performance deterioration can be diagnosed by setting performance alerts. These alerts show the increase or decrease in a counter value with respect to the pre-defined value. Normally the counters are monitored for a period of 24-hours. If an error occurs, a message regarding the error can either be sent to the administrator or written to the Application log. Log files can be saved in various formats such as text file, binary file, or SQL database file. The counters that are to be measured in order to resolve performance issues are as follows:

  • Memory: Pages/sec
  • Memory: Available Bytes
  • SQL Server: Buffer Manager: Buffer Cache Hit Ratio
  • Physical Disk: Disk Reads/sec
  • Physical Disk: Disk Writes/sec
  • Physical Disk: %Disk Time
  • Physical Disk: Avg: Disk Queue Length
  • Physical Disk: % Free Space
  • Logical Disk: %Free Space
  • Processor: %Processor Time
  • System: Processor Queue Length
  • Network Interface: Bytes Received/sec
  • Network Interface: Bytes Sent/sec
  • Network Interface: Bytes/sec
  • Network Interface: Output Queue Length
  • SQL Server: General: User Connection

Performance counters related to memory usage

The memory used by SQL Server can cause performance issues if bottlenecks occur due to high memory consumption or high I/O operations. The counters that are to be monitored to detect and diagnose memory bottlenecks are as follows:

  • Memory: Pages/sec
    The Memory: Pages/sec counter is used to measure the number of pages that are paged out from Random Access Memory (RAM) to the virtual memory on the hard disk. The more the paging is, the more the I/O overhead will be on the server. The average counter value should be between zero and twenty. If the value is above twenty, the server will experience a memory bottleneck.
  • Memory: Available Bytes
    The Memory: Available Bytes counter measures the amount of free physical memory on the server. The average value of the counter should be greater than 5MB, otherwise the server will experience a performance hit due to memory pressure.
  • SQL Server: Buffer Manager: Buffer Cache Hit Ratio
    The SQL Server: Buffer Manager: Buffer Cache Hit Ratio counter value depends on the applications running on the server. These applications can be OLTP (Online Transaction Processing) or OLAP (Online Analytical Processing). If the server is running OLTP applications, the counter value should be between 90% and 99%. If the counter value is less than 90%, it indicates pressure on RAM. In order to resolve the issue, RAM should be increased.
  • Physical Disk: Disk Reads/sec and Physical Disk: Disk Writes/sec
    The values of the Physical Disk: Disk Reads/sec and the Physical Disk: Disk Writes/sec counters are difficult to interpret because the explicit transfer rate limits depend on the hardware installed on the server. These counters should be measured in conjunction with the values of the Memory: Available Bytes and the SQL Server: Buffer Manager: Buffer Cache Hit Ratio counters to detect memory pressure.

Performance counters related to disk usage

The performance counters related to disk usage measure the values of physical and logical disk counters. These counters detect and indicate disk bottlenecks. The counters used to detect disk pressure are as follows:

  • Physical Disk: %Disk Time:
    The Physical Disk: %Disk Time counter is used to measure I/O bottlenecks on a physical disk array. It is not used to monitor the logical partition or the individual disks in the array. If the value of this counter exceeds 55% in a 24-hour monitoring period, it indicates that the server might be experiencing a disk I/O bottleneck.
  • Physical Disk: Avg: Disk Queue Length:
    The Physical Disk Avg: Disk Queue Length counter is used to measure the pressure on a physical disk array. This counter is used to monitor individual disks in an array. If the value of this counter is eight in an array of four disks, the server might be experiencing a disk I/O bottleneck.
  • Physical Disk: %Free Space and Logical Disk: %Free Space:
    These counters are used to measure the free space on the physical disk and logical disk partitions. If the value is less than 15%, an alert is triggered for disk errors.

Performance related to processor usage

The values of the performance counters related to the processor indicate the pressure on the processor resources. The counters used to monitor processor usage are as follows:

  • Processor: %Processor Time:
    This counter is used to measure the utilization of a processor. If the value of the counter exceeds 80%, it indicates a processor bottleneck. It can be resolved either by reducing the load on the processor or by checking the database and its query performance.
  • System: Processor Queue Length:
    This counter is monitored along with the Processor: %Processor Time counter to measure the utilization of a processor. If its value is more than two, it indicates a processor bottleneck.

Performance counters related to network usage

The values of Network Interface counters measure the number of bytes sent or received over a TCP/IP connection. No pre-defined values have been set for these counters. A sudden increase in the network traffic indicates an external attack. The counters used to measure the network traffic are as follows:

  • Network Interface: Bytes Received/sec:
    This counter measures the number of bytes received over network adapters. A sudden increase in the value of this counter indicates an external attack.
  • Network Interface: Bytes Sent/sec:
    This counter measures the number of bytes sent over network adapters. A sudden increase in the value of this counter indicates an external attack.
  • Network Interface: Bytes/sec:
    This counter measures the number of bytes sent or received over network adapters. A sudden increase in the value of this counter indicates an external attack.
  • Network Interface: Output Queue Length:
    This counter shows that data packets are in queue. If the value of this counter is more than two, it indicates a network bottleneck.

Performance counter related to user connections

User connections can cause performance issues in SQL Server 2005. The counter used to monitor the user connections is as follows: SQL Server: General: User Connection: This counter shows the value of user connections. If the value of this counter is more than 255, it indicates a bottleneck.

SSIS Best practices

http://sqlcat.com/sqlcat/b/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

How many of you have heard the myth that Microsoft® SQL Server® Integration Services (SSIS) does not scale? The first question we would ask in return is: “Does your system need to scale beyond 4.5 million sales transaction rows per second?” SQL Server Integration Services is a high performance Extract-Transform-Load (ETL) platform that scales to the most extreme environments. And as documented in SSIS ETL world record performance, SQL Server Integration Services can process at the scale of 4.5 million sales transaction rows per second.

1

SSIS is an in-memory pipeline, so ensure that all transformations occur in memory.

The purpose of having Integration Services within SQL Server features is to provide a flexible, robust pipeline that can efficiently perform row-by-row calculations and parse data all in memory.

While the extract and load phases of the pipeline will touch disk (read and write respectively), the transformation itself should process in memory. If transformations spill to disk (for example with large sort operations), you will see a big performance degradation. Construct your packages to partition and filter data so that all transformations fit in memory.

A great way to check if your packages are staying within memory is to review the SSIS performance counter Buffers spooled, which has an initial value of 0; above 0 is an indication that the engine has started swapping to disk. For more information, please refer to Something about SSIS Performance Counters.

 

2

Plan for capacity by understanding resource utilization.

SQL Server Integration Services is designed to process large amounts of data row by row in memory with high speed. Because of this, it is important to understand resource utilization, i.e., the CPU, memory, I/O, and network utilization of your packages.

CPU Bound
Seek to understand how much CPU is being used by Integration Services and how much CPU is being used overall by SQL Server while Integration Services is running. This latter point is especially important if you have SQL Server and SSIS on the same box, because if there is a resource contention between these two, it is SQL Server that will typically win – resulting in disk spilling from Integration Services, which slows transformation speed.

The perfmon counter that is of primary interest to you is Process / % Processor Time (Total). Measure this counter for both sqlservr.exe and dtexec.exe. If SSIS is not able to drive close to 100% CPU load, this may be indicative of:

  • Application contention: For example, SQL Server is taking on more processor resources, making them unavailable to SSIS.
  • Hardware contention: A common scenario is that you have suboptimal disk I/O or not enough memory to handle the amount of data being processed.
  • Design limitation: The design of your SSIS package is not making use of parallelism, and/or the package uses too many single-threaded tasks.

Network Bound
SSIS moves data as fast as your network is able to handle it. Because of this, it is important to understand your network topology and ensure that the path between your source and target have both low latency and high throughput.

The following Network perfmon counters can help you tune your topology:

  • Network Interface / Current Bandwidth: This counter provides an estimate of current bandwidth.
  • Network Interface / Bytes Total / sec: The rate at which bytes are sent and received over each network adapter.
  • Network Interface / Transfers/sec: Tells how many network transfers per second are occurring. If it is approaching 40,000 IOPs, then get another NIC card and use teaming between the NIC cards.

These counters enable you to analyze how close you are to the maximum bandwidth of the system. Understanding this will allow you to plan capacity appropriately whether by using gigabit network adapters, increasing the number of NIC cards per server, or creating separate network addresses specifically for ETL traffic.

I/O Bound
If you ensure that Integration Services is minimally writing to disk, SSIS will only hit the disk when it reads from the source and writes to the target. But if your I/O is slow, reading and especially writing can create a bottleneck.

Because tuning I/O is outside the scope of this technical note, please refer to Predeployment I/O Best Practices.  Remember that an I/O system is not only specified by its size ( “I need 10 TB”) – but also by its sustainable speed (“I want 20,000 IOPs”).

Memory bound
A very important question that you need to answer when using Integration Services is: “How much memory does my package use?”

The key counters for Integration Services and SQL Server are:

  • Process / Private Bytes (DTEXEC.exe) – The amount of memory currently in use by Integration Services. This memory cannot be shared with other processes.
  • Process / Working Set (DTEXEC.exe) – The total amount of allocated memory by Integration Services.
  • SQL Server: Memory Manager / Total Server Memory: The total amount of memory allocated by SQL Server. Because SQL Server has another way to allocate memory using the AWE API, this counter is the best indicator of total memory used by SQL Server. To understand SQL Server memory allocations better, refer to Slava Ok’s Weblog.
  • Memory / Page Reads / sec – Represents to total memory pressure on the system. If this consistently goes above 20, the system is under memory pressure.

 

3

Baseline source system extract speed.

Understand your source system and how fast you extract from it. After all, Integration Services cannot be tuned beyond the speed of your source – i.e., you cannot transform data faster than you can read it.
Measure the speed of the source system by creating a very simple package reading data from your source with the a destination of “Row Count”:

Execute the package from the command line (DTEXEC) and measure the time it took for it to complete its task. Use the Integration Services log output to get an accurate calculation of the time. You want to calculate rows per second:

Rows / sec = Row Count / TimeData Flow

Based on this value, you now know the maximum number of rows per second you can read from the source – this is also the roof on how fast you can transform your data. To increase this Rows / sec calculation, you can do the following:

  • Improve drivers and driver configurations: Make sure you are using the most up-to-date driver configurations for your network, data source, and disk I/O. Often the default network drivers on your server are not configured optimally for the network stack, which results in performance degradations when there are a high number of throughput requests. Note that for 64-bit systems, at design time you may be loading 32-bit drivers; ensure that at run time you are using 64-bit drivers.
  • Start multiple connections: To overcome limitations of drivers, you can try to start multiple connections to your data source. As long as the source can handle many concurrent connections, you may see an increase in throughput if you start several extracts at once. If concurrency is causing locking or blocking issues, consider partitioning the source and having your packages read from different partitions to more evenly distribute the load.
  • Use multiple NIC cards: If the network is your bottleneck and you’ve already ensured that you’re using gigabit network cards and routers, then a potential solution is to use multiple NIC cards per server. Note that you will have to be careful when you configure multiple NIC environments; otherwise you will have network conflicts.

 

4

Optimize the SQL data source, lookup transformations, and destination.

When you execute SQL statements within Integration Services (as noted in the above Data access mode dialog box), whether to read a source, to perform a look transformation, or to change tables, some standard optimizations significantly help performance:

  •  Use the NOLOCK or TABLOCK hints to remove locking overhead.
  • To optimize memory usage, SELECT only the columns you actually need. If you SELECT all columns from a table (e.g., SELECT * FROM) you will needlessly use memory and bandwidth to store and retrieve columns that do not get used. .
  • If possible, perform your datetime conversions at your source or target databases, as it is more expensive to perform within Integration Services..
  • In SQL Server 2008 Integration Services, there is a new feature of the shared lookup cache. When using parallel pipelines (see points #8 and #10 below), it provides a high-speed, shared cache. .
  • If Integration Services and SQL Server run on the same server, use the SQL Server destination instead of the OLE DB destination to improve performance..
  • Commit size 0 is fastest on heap bulk targets, because only one transaction is committed. If you cannot use 0, use the highest possible value of commit size to reduce the overhead of multiple-batch writing.  Commit size = 0 is a bad idea if inserting into a Btree – because all incoming rows must be sorted at once into the target Btree—and if your memory is limited, you are likely to spill.  Batchsize = 0 is ideal for inserting into a heap. For an indexed destination, I recommend testing between 100,000 and 1,000,000 as batch size.
  • Use a commit size of <5000 to avoid lock escalation when inserting; note that in SQL Server 2008 you can now enable/disable lock escalation at the object level, but use this wisely.
  • Heap inserts are typically faster than using a clustered index. This means that you may want to drop indexes and rebuild if you are changing a large part of the destination table; you will want to test your inserts both by keeping indexes in place and by dropping all indexes and rebuilding to validate..
  • Use partitions and partition SWITCH command; i.e., load a work table that contains a single partition and SWITCH it in to the main table after you build the indexes and put the constraints on..
  • Another great reference from the SQL Performance team is Getting Optimal Performance with Integration Services Lookups.

 

5

Tune your network.

A key network property is the packet size of your connection. By default this value is set to 4,096 bytes. This means a new network package must be assemble for every 4 KB of data. As noted in SqlConnection.PacketSize Property in the .NET Framework Class Library, increasing the packet size will improve performance because fewer network read and write operations are required to transfer a large data set.
If your system is transactional in nature, with many small data size read/writes, lowering the value will improve performance.

Since Integration Services is all about moving large amounts of data, you want to minimize the network overhead. This means that the value 32K (32767) is the fastest option. While it is possible to configure the network packet size on a server level using sp_configure, you should not do this. The database administrator may have reasons to use a different server setting than 32K. Instead, override the server settings in the connection manager as illustrated below.

Another network tuning technique is to use network affinity at the operating system level. At high throughputs, you can sometimes improve performance this way.

For the network itself, you may want to work with your network specialists to enable jumbo frames to increase the default payload of 1,500 bytes to 9,000 bytes. By enabling jumbo frames, you will further decrease the amount of network operation required to move large data sets.

6

Use data types – yes, back to data types! –wisely.

Of all the points on this top 10 list, this is perhaps the most obvious. Yet, it is such an important point that it needs to be made separately. Follow these guidelines:

  •  Make data types as narrow as possible so you will allocate less memory for your transformation.
  • Do not perform excessive casting of data types – it will only degrade performance. Match your data types to the source or destination and explicitly specify the necessary data type casting..
  • Watch precision issues when using the money, float, and decimal types. Also, be aware the money is faster than decimal, and money has fewer precision considerations than float.

7

Change the design.

There are some things that Integration Services does well – and other tasks where using another tool is more efficient. Your tool choice should be based on what is most efficient and on a true understanding of the problem. To help with that choice, consider the following points:

  •  Do not sort within Integration Services unless it is absolutely necessary. In order to perform a sort, Integration Services allocates the memory space of the entire data set that needs to be transformed. If possible, presort the data before it goes into the pipeline. If you must sort data, try your best to sort only small data sets in the pipeline. Instead of using Integration Services for sorting, use an SQL statement with ORDER BY to sort large data sets in the database – mark the output as sorted by changing the Integration Services pipeline metadata on the data source.
  • There are times where using Transact-SQL will be faster than processing the data in SSIS. As a general rule, any and all set-based operations will perform faster in Transact-SQL because the problem can be transformed into a relational (domain and tuple) algebra formulation that SQL Server is optimized to resolve. Also, the SQL Server optimizer will automatically apply high parallelism and memory management to the set-based operation – an operation you may have to perform yourself if you are using Integration Services. Typical set-based operations include:
    • Set-based UPDATE statements – which are far more efficient than row-by-row OLE DB calls.
    • Aggregation calculations such as GROUP BY and SUM. These are typically also calculated faster using Transact-SQL instead of in-memory calculations by a pipeline.
  • Delta detection is the technique where you change existing rows in the target table instead of reloading the table. To perform delta detection, you can use a change detection mechanism such as the new SQL Server 2008 Change Data Capture (CDC) functionality. If such functionality is not available, you need to do the delta detection by comparing the source input with the target table. This can be a very costly operation requiring the maintenance of special indexes and checksums just for this purpose. Often, it is fastest to just reload the target table. A rule of thumb is that if the target table has changed by >10%, it is often faster to simply reload than to perform the logic of delta detection.

8

Partition the problem.

One of the main tenets of scalable computing is to partition problems into smaller, more manageable chunks. This allows you to more easily handle the size of the problem and make use of running parallel processes in order to solve the problem faster.

For ETL designs, you will want to partition your source data into smaller chunks of equal size. This latter point is important because if you have chunks of different sizes, you will end up waiting for one process to complete its task. For example, looking at the graph below, you will notice that for the four processes executed on partitions of equal size, the four processes will finish processing January 2008 at the same time and then together continue to process February 2008. But for the partitions of different sizes, the first three processes will finish processing but wait for the fourth process, which is taking a much longer time. The total run time will be dominated by the largest chunk.

To create ranges of equal-sized partitions, use time period and/or dimensions (such as geography) as your mechanism to partition. If your primary key is an incremental value such as an IDENTITY or another increasing value, you can use a modulo function. If you do not have any good partition columns, create a hash of the value of the rows and partition based on the hash value. For more information on hashing and partitioning, refer to the Analysis Services Distinct Count Optimization white paper; while the paper is about distinct count within Analysis Services, the technique of hash partitioning is treated in depth too.

Some other partitioning tips:

  • Use partitioning on your target table. This way you will be able to run multiple versions of the same package, in parallel, that insert data into different partitions of the same table. When using partitioning, the SWITCH statement is your friend. It not only increases parallel load speeds, but also allows you to efficiently transfer data. Please refer to the SQL Server Books Online article Transferring Data Efficiently by Using Partition Switching for more information.
  • As implied above, you should design your package to take a parameter specifying which partition it should work on. This way, you can have multiple executions of the same package, all with different parameter and partition values, so you can take advantage of parallelism to complete the task faster.
  • From the command line, you can run multiple executions by using the “START” command. A quick code example of running multiple robocopy statements in parallel can be found within the Sample Robocopy Script to custom synchronize Analysis Services databases technical note.

9

Minimize logged operations.

When you insert data into your target SQL Server database, use minimally logged operations if possible. When data is inserted into the database in fully logged mode, the log will grow quickly because each row entering the table also goes into the log.

Therefore, when designing Integration Services packages, consider the following:

  • Try to perform your data flows in bulk mode instead of row by row. By doing this in bulk mode, you will minimize the number of entries that are added to the log file. This reduction will improve the underlying disk I/O for other inserts and will minimize the bottleneck created by writing to the log.
  • If you need to perform delete operations, organize your data in a way so that you can TRUNCATE the table instead of running a DELETE. The latter will place an entry for each row deleted into the log. But the former will simply remove all of the data in the table with a small log entry representing the fact that the TRUNCATE occurred. In contrast with popular belief, a TRUNCATE statement can participate in a transaction.
  • Use the SWITCH statement and partitioning. If partitions need to be moved around, you can use the SWITCH statement (to switch in a new partition or switch out the oldest partition), which is a minimally logged statement.
  • Be careful when using DML statements; if you mix in DML statements within your INSERT statements, minimum logging is suppressed.

10

Schedule and distribute it correctly.

After your problem has been chunked into manageable sizes, you must consider where and when these chunks should be executed. The goal is to avoid one long running task dominating the total time of the ETL flow.

A good way to handle execution is to create a priority queue for your package and then execute multiple instances of the same package (with different partition parameter values). The queue can simply be a SQL Server table. Each package should include a simple loop in the control flow:

  1.  Pick a relevant chunk from the queue:
    1. “Relevant” means that is has not already been processed and that all chunks it depends on have already run.
    2. If no item is returned from the queue, exit the package.
  2. Perform the work required on the chunk.
  3. Mark the chunk as “done” in the queue.
  4. Return to the start of loop.

Picking an item from the queue and marking it as “done” (step 1 and 3 above) can be implemented as stored procedure, for example.

The queue acts as a central control and coordination mechanism, determining the order of execution and ensuring that no two packages work on the same chunk of data. Once you have the queue in place, you can simply start multiple copies of DTEXEC to increase parallelism.

Advertisements

From → ETL

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: