Skip to content

Query Tuning For Data warehouse

August 25, 2011

If a query in data warehouse has a performance problem, before turning the quey, indexes should be appropriate and statistics are up to date

There are two main classes of query plans for star join queries typical in data warehouses: scan plans and seek plans.

  • Scan plans scan a range of the fact table and join the resulting rows with one or more dimensions. Scan plans typically are the most efficient when a relatively large fraction of the rows qualify.
  • Seek plans find rows from dimensions that qualify, and then seek into the fact table to find joining rows, via clustered or nonclustered indexes on the dimension key columns of the fact table. Seek plans are typically the most efficient for highly selective queries where a tiny percentage of rows qualify.

If a query gets poor performance though everything else checks out, the main categories of problems to consider are these:

  • Plan choice:The query plan is not efficient enough and you need a better plan (independent of parallelism)
  • Parallel performance: The query plan is a parallel plan, but the full parallel power of the computer is not being utilized effectively.

1. You can tell that parallel performance is a problem if you observe the following:

1.    You are getting a serial plan but runtime is too long. You can tell the plan is serial if the plan does not have any parallelism (exchange) operators [BOL08d].

2.    You run the query with degree of parallelism K and during a significant part of execution, fewer than K cores are shown to be busy in Windows Task Manager under the Performance tab.

3.    In the STATISTICS XML plan or runtime .sqlplan file generated from graphical showplan, the RunTimeCountersPerThread information shows that the ActualRows count for each thread is skewed so that a small number of threads are doing most of the work.

For an example that illustrates both symptoms 2 and 3 above, if you run a query that does a nested loop join between a Date dimension and a partitioned fact table, and only three different Date rows qualify, you may see RunTimeCountersPerThread information that look similar to this:


  <RunTimeCountersPerThread Thread=”7” ActualRows=”0” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”5” ActualRows=”0” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”6” ActualRows=”0” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”4” ActualRows=”0” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”3” ActualRows=”0” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”2” ActualRows=”49153977” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”1” ActualRows=”51723098” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”8” ActualRows=”56013568” ActualEndOfScans=”1” ActualExecutions=”1” />

  <RunTimeCountersPerThread Thread=”0” ActualRows=”0” ActualEndOfScans=”0” ActualExecutions=”0” />


 It is normal for thread 0 to have zero actual rows. The other threads ideally will have roughly the same number of rows. In this case, five threads have zero ActualRows counts. This pattern usually arises from a nested loop join that has a small number of outer rows qualifying (equal to the number of threads doing work). You can work around the problem by modifying the query to force the use of a hash join.

For example consider a query like the following (based on the Project REAL schema [Wyatt05]):

 select COUNT(*)

 from Tbl_Fact_Store_Sales f, Tbl_Dim_Date d

 where f.SK_Date_ID = d.SK_Date_ID

 and d.Date_Value in (N’December 1, 2004′, N’December 2, 2004′, N’December 3, 2004′)

 With a large fact table, this query typically gets a parallel nested loop join query plan. That plan will use at most three threads because there are only three qualifying outer rows. To improve parallelism, you can rewrite the query to the following equivalent form:

select Date_Value, COUNT(*)

from Tbl_Fact_Store_Sales f, Tbl_Dim_Date d

where f.SK_Date_ID = d.SK_Date_ID

and f.SK_Date_ID between 20041201 and 20041203

group by Date_Value

option (hash join)

 This query gets the same answer as the previous one but it eliminates the nested loop join and can be fully parallelized—it can use all the processor cores on the computer. The important changes from the original query are that the range predicate is now expressed directly on the fact table and not on the dimension table, and that the query hint option(hash join) has been added. The BETWEEN predicate on the fact table allows a parallel range seek to be done against the fact table to find the rows for the specified days. The use of a hash join avoids the limit of one thread per outer row faced with a loop join.

If the number of days of data that qualify is greater than or equal to the maximum degree of parallelism at which the query executes, filtering the fact table by using a join with the date dimension typically will perform well, and this type of query tuning is not needed.

2. Forcing Seek Plans When Scan Is Chosen and Vice Versa

As mentioned earlier, in a data warehouse star join query two main types of plans are often possible: scan plans and seeks plans. Depending on the selectivity of the filters on dimensions, there is a crossover point where a scan plan’s execution time equals that of an equivalent seek plan, as illustrated below:


Figure 2. Scan-seek cost crossover as a function of join selectivity (fraction of rows from fact table that qualify after joining with filtered dimension)

To the left of where the curves cross, a seek plan is best. To the right, a scan plan is best. The query optimizer’s internal cost estimates for plans are quite good, but they are not perfect. If the cost characteristics of a query place it near the crossover point for actual scan versus seek execution time, the plan chosen by the optimizer could be a scan when a seek would execute faster or vice versa.

The following example based on the AdventureWorksDW database illustrates this:

 use AdventureWorksDW;

select d.ModelName, SUM(f.SalesAmount)

from FactInternetSales f, DimProduct d

where d.ProductKey = f.ProductKey and d.ModelName like N’B%’

group by d.ModelName;

 The plan for this query obtained by default by the optimizer looked like this in one of our tests (it is a scan plan because it scans the fact table rather than using the nonclustered index on ProductKey on the fact table):


Figure 3. Scan plan for sample query (scans fact table and performs hash join with dimension(s))

The following query has been modified to include hints to force a plan that finds the dimension table rows that qualify, then seeks into the nonclustered index on the key of that dimension on the fact table, and finally looks up the qualifying rows of the fact table:

 select d.ModelName, SUM(f.SalesAmount)

from FactInternetSales f with(forceseek), DimProduct d

where d.ProductKey = f.ProductKey and d.ModelName like N’B%’

group by d.ModelName;

 Following is a fragment of this plan (the aggregation part is excluded):


Figure 4. Seek plan for sample query

This seek plan actually executes several times faster than the equivalent scan plan above. If you think you may need a seek plan but you see a scan plan, you can force a seek plan by using a forceseek hint similar to that shown above. Optionally, you may also wish to use an index hint to force seeking into a specific index.

Similarly, if you are getting a seek-style plan (joining a dimension to a nonclustered fact table index, then looking up the associated fact table rows), you can convert it into a plan that scans all or a range of the fact table with hints. For example, consider this query:

select d.ModelName, SUM(Fact.SalesAmount)

from Fact /*with(index=1)*/, #tempProd d

where Fact.ProductKey = d.ProductKey and OrderDateKey between 10 and 200

group by d.ModelName;

 In this example, Fact is a large fact table with a clustered index on OrderDateKey and a nonclustered index on ProductKey. #tempProd is a small table with two Product dimension rows in it. The default plan for this, without hints, is a seek plan that seeks the ProductKey index on Fact. Adding the index=1 hint (commented in the example) changes the plan so a range seek is performed on the Fact table’s clustered OrderDateKey index and the results are hash-joined with #tempProd. In general, consider using index=1 hint on the fact table, optionally with an option(hash join) query hint, to force a scan plan if that is beneficial.

One important consideration regarding using scan versus seek plans on a large SMP is related to the costing of seek operations. Because it assumes that a seek requires a random I/O (the worst case), the query optimizer tends to over-cost seeks, both

a.    When the I/O system is very fast, and

b.    When all relevant data fits in the main memory buffer pool.

In other words, scan plans may be chosen instead of seeks in these situations, even when seeks would be better. If you are seeing scan plans and you are not getting the performance you need, and a) or b) apply to you, consider forcing a seek plan with hints.

Reference to Book online of Microsoft

Detect performance Issues

1. Database design issue if

  • Excessive sorting operations are performed. If you continually perform the same sorting operations over and over, you can avoid these with appropriate indexing.
  • Excessive RID lookups are performed on heap tables. RID lookups mean extra IOs are required to retrieve columns that are not in the index used. This can be avoided with covered nonclustered indexes.
  • Key lookups against the clustering keys look like joins however they are marked as “lookups” only in the XML showplan. These can be avoided with covered nonclustered indexes.
  • A potentially beneficial index is missing on join columns resulting in HASH joins. Indexes on join columns may avoid the hash.

2. CPU gotchas if

  • if signal waits > 25% of total waits, there is a CPU bottleneck. See sys.dm_os_wait_stats for Signal waits and Total waits. Signal waits measure the time spent in the runnable queue waiting for CPU. High signal waits indicate a CPU bottleneck.
  • Avoid inappropriate plan re-use. If the query is identical, then plan re-use is a good thing. However, query parameterization that allows plan re-use is only appropriate when the result set (and intermediate work tables) are of similar size to the original plan. If result set sizes vary significantly due to differing parameter values which are common in data warehouse scenarios, plan re-use can be detrimental. Bad plans can also lead to longer running queries and IO or memory pressure. Therefore, the cost of plan generation in such cases is preferable to plan re-use. Unlike OLTP, data warehouse queries are not always identical in terms of result sets or optimal query plans.

3. Memory bottleneck if

  • Consistently low average page life expectancy. See Average Page Life   Expectancy Counter which is in the   Perfmon object SQL Server Buffer   Manager (this represents is the   average number of seconds a page stays   in cache). For OLTP, an average page   life expectancy of 300 is 5 minutes.   Anything less could indicate memory   pressure, missing indexes, or a cache   flush.

Page Life Expectancy counter, which is commonly known as PLE as well.

You can find the value of the PLE by running following query.

SELECT [object_name],
FROM sys.dm_os_performance_counters
WHERE [object_name] LIKE '%Manager%'
AND [counter_name] = 'Page life expectancy'

The recommended value of the PLE counter is (update: minimum of) 5minutes. I have seen on busy system this value to be as low as even 45 seconds and on unused system as high as xxx seconds. Page Life Expectancy is number of seconds a page will stay in the buffer pool without references. In simple words, if your page stays longer in the buffer pool (area of the memory cache) your PLE is higher, leading to higher performance as every time request comes there are chances it may find its data in the cache itself instead of going to hard drive to read the data.

  • Sudden big drop in page life expectancy. DW applications (e.g. big transactions) could experience big drops in page life expectancy. This is due to a cache flush from a big read. See Perfmon object SQL Server Buffer Manager.
  • Pending memory grants. See counter Memory Grants Pending, in the Perfmon object SQL Server Memory Manager. Large memory grants can be common in Data Warehouse applications. More memory may help, otherwise the user cannot execute until memory grant occurs.
  • Sudden drops or consistenty low SQL Cache hit ratio. OLTP applications (eg small transactions) should have a high cache hit ratio. Since OLTP transactions are small, there should not be (1) big drops in SQL Cache hit rates or (2) consistently low cache hit rates < 90%. Drops or low cache hit may indicate memory pressure or missing indexes.

4. IO bottleneck if

  • The best metric for write performance is disk seconds per read and disk seconds per write. When the IO system is NOT under significant load, there will be no disk queuing and thus disk seconds per read or write should be as good as it gets Normally it takes 4-8 milliseconds to complete a read when there is no IO pressure. Factors for IO throughput are the number of spindles, and drive throughput such as sequential and random IOs per second (according to the vendor). As the IO requests increase, you may notice disk queuing. The effects of queuing are reflected in high disk seconds per read or write. Periodic higher values for disk seconds/read may be acceptable for many applications. For high performance OLTP applications, sophisticated SAN subsystems provide greater IO scalability and resiliency in handling spikes of IO activity. Sustained high values for disk seconds/read (>15ms) does indicate a disk bottleneck.
  • High average disk seconds per write. See Perfmon Logical or Physical disk. Data Warehouse loads can be either logged with inserts, updates or deletes, or non-logged using bulk copy. Logged operations require transaction log writes. A transaction log write can be as fast as 1ms (or less) for high performance SAN environments. For many applications, a periodic spike in average disk seconds per write is acceptable considering the high cost of sophisticated SAN subsystems. However, sustained high values for average disk seconds/write is a reliable indicator of a disk bottleneck.
  • Big IOs such as table and range scans may be due to missing indexes.

5. Blocking bottleneck if

  • Index contention. Look for high lock and latch waits in sys.dm_db_index_operational_stats. Compare with lock and latch requests.
  • High average row lock or latch waits. The average row lock or latch waits are computed by dividing lock and latch wait milliseconds (ms) by lock and latch waits. The average lock wait ms computed from sys.dm_db_index_operational_stats represents the average time for each block.
  • Block process report shows long blocks. See sp_configure “blocked process threshold” and Profiler “Blocked process Report” under the Errors and Warnings event.
  • High number of deadlocks. See Profiler “Graphical Deadlock” under Locks event to identify the statements involved in the deadlock.

6. Network bottleneck if

  • High network latency coupled with an application that incurs many round trips to the database.
  • Network bandwidth is used up. See counters packets/sec and current bandwidth counters in the network interface object of Performance Monitor. For TCP/IP frames actual bandwidth is computed as packets/sec * 1500 * 8 /1000000 Mbps.

7. Wait statistics gotchas if

  • Since DataWarehouse and Reporting workloads are largely reads which are compatible with other reads, incompatible exclusive lock waits would ordinarily only come into play during batch loads or periodic feeds. If the top wait statistics are LCK_x. or PAGELATCH_EX, see “SQL Server 2005 Performance Tuning using Waits & Queues” for an explanation of sys.dm_os_wait_stats.
  • There is an IO bottleneck if top wait statistics in sys.dm_os_wait_stats are related to IO such as ASYNCH_IO_COMPLETION, IO_COMPLETION, LOGMGR, WRITELOG, or PAGEIOLATCH_x.

8. Indexing gotchas if

  • Large data warehouse can benefit from more indexes. Indexes can be used to cover queries and avoid sorting. For a data warehouse application, the cost of index overhead is only paid when data is loaded.
  • Check for missing indexes in sys.dm_db_missing_index_group_stats, sys.dm_db_missing_index_groups and sys.dm_db_missing_index_details

9. Watch out for gragmentation

The Dynamic Management table valued function sys.dm_db_index_physical_stats returns the fragmentation percentage in the column avg_fragmentation_in_percent. Fragmentation should not exceed 25%. Reducing index fragmentation can benefit big range scans, common in data warehouse and Reporting scenarios

10.  Consider Table partitioning for fast loads

For the large tables common in Data Warehouses, table partitioning offers important performance and manageability advantages. For example, the fastest type of load is a non-logged bulk copy. The requirements for non-logged bulk copies are that indexes must be dropped. This is not feasible on a huge billion row table UNLESS you use table partitioning. This allows one to create a staging table identical to the large table (minus indexes). A fast non-logged bulk copy is used to load data. Thereafter, indexes are added to the staging table followed by constraints. Then, a meta-data only SWITCH IN operation switches pointer locations for the populated staging table and the empty target partition of the partitioned table resulting in an fully populated partition and empty staging table. Besides a fast bulk copy, the staging table allows us to eliminate blocking in the large partitioned table during the load. For more information refer to “Loading Bulk Data into Partitioned Tables”. In addition to fast loads, partitioned tables allow fast deletes (or archiving purposes or sliding window deletes) where large logged deletes are replaced with meta-data only partition SWITCH OUT operations that switches pointer locations for the full partition (to be ‘deleted’) and an empty monolithic table. The SWITCH OUT results in an empty partition and a fully populated monolithic staging table. Thereafter the monolithic table can either be dropped or added to a partitioned archive table using SWITCH IN. Partitions also provide manageability improvements when combined with specific filegroup placement, allowing for customized backup and restore strategies.

From Lindsey.allen

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: