Skip to content

Hardware Selection for Large Scale Data warehouse

August 26, 2011

Best practices for hardware selection include considerations for assessing processor and memory resources.

Processors

The number of processors needed is usually the most difficult hardware characteristic to choose. Several things generally determine how many processors are needed:

1.    The number of queries executed in a specific period of time

2.    Query complexity (simple, medium, complex or very complex)

3.    Desired response time for a single query. This can determine the amount of parallelism you want to achieve per query.

Memory

Having data already in memory can make query run time more predictable. Having a fast disk subsystem is essential for quickly getting data into memory. It is also best if the data in memory can be used by other queries.

Empirical evidence has shown that 4 GB per core is sufficient for most data warehouse applications. This estimate can be influenced by the number of simultaneous queries or transactions as well as the number of large table/index scans needed.

Maximum limitations of RAM is 2 terabytes and of CPU is 64 cores for MS SQL Server 2008 Enterprise Edition 64 bits

 

Intel Itanium processors typically offer significantly improved reliability compared to scale-out systems that use x86 components, especially the HP Integrity Superdome offer added RAS (Reliability, Availability, and Serviceability) capabilities.

  • The HP Integrity product line support the ability to logically remove a single failed processor or memory module (DIMM) from the system automatically, until it can be replaced by a technician.
  • The system can operate even with the failure of up to half of the processors.
  • Automatic error detection, error correction, error isolation, and recovery from memory failures are supported.
  • The memory systems of computers such as the HP Integrity Superdome offer sophisticated error correction, making them tolerant of transient memory faults and hard faults in individual memory components.
  • High-end scale-up systems also support hot-add of memory. Added memory becomes immediately visible to all the processors of the scale-up system—it is not necessary to restart the server or SQL Server.

Scale-up based systems are often used together with Storage Area Networks (SANS), such as the HP XP 240000. SANs also simplify management compared to the direct-attached storage used in scale-out approaches

For further reasons of using high-end hardware are the following:

  • Scale-up servers can be partitioned into multiple logical computers. This allows for a more efficient use of system resources.
  • During many SQL query execution operations (such as UNION, DISTINCT, TOP, and some sorts), data must be pulled into a single server simply because some operations cannot be parallelized efficiently across multiple separate computers. These types of operations on large quantities of data in a business intelligence (BI) environment are performed more efficiently in a scale-up environment.
  • Those who use Microsoft SQL Server Analysis Services OLAP cubes also stand to benefit from a scale-up solution because OLAP queries perform most efficiently when the cube’s cells are memory resident. Scale-up servers may be configured with up to 2 terabytes of memory.
Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: