Microsoft SQL Server 2008 R2 : Physical server design - Memory configuration

7/26/2012 5:48:20 PM

Insufficient RAM is a common problem in SQL Server systems experiencing performance problems. Fortunately, RAM is both reasonably inexpensive and relatively easy to upgrade.

There are some important considerations when selecting and configuring server RAM, such as the module capacity and fault tolerance, and latency issues on large, multi-CPU systems. In this section, we'll look at configuring a server's RAM slots and take a brief look at the NUMA architecture, which is used to derive maximum performance in large, multi-CPU systems.

1. Design for future RAM upgrades

When selecting and configuring RAM for SQL Server, you must consider the amount, type, and capacity of the chosen RAM. If the server will be used for future system consolidation, or the exact RAM requirements can't be accurately predicted, then apart from loading the system up with the maximum possible memory, it's important to allow for future memory upgrades.

Virtualization, addresses this issue nicely by being able to easily grant and revoke CPU/memory resources as the server's needs increase or decrease. On dedicated, nonvirtualized systems, this issue is typically addressed by using fewer higher-capacity memory chips, therefore leaving a number of free slots for future upgrades if required. This avoids the common problem of having to remove and replace lower-capacity RAM chips if the server's RAM slots are full and more memory is required. Although initially more expensive, this approach provides flexibility for future requirements.

Finally, in order to provide a system with a degree of resilience against memory errors, error-correcting code (ECC) RAM should be installed. Used by all the major system vendors, ECC forms an important part of configuring a fault-tolerant SQL Server system.

Table 1 shows the maximum memory supported by SQL Server 2008 running on Windows Server 2008.

Table 1. Maximum memory for SQL Server 2008 editions
	Maximum supported memory
SQL Server 2008 version	32-bit	64-bit
Enterprise	OS max	OS max
Standard	OS max	OS max
Web	OS max	OS max
Workgroup	OS max	4GB
Express	1GB	1GB
Windows Server 2008	32-bit	64-bit
Data Center	64GB	2TB
Enterprise	64GB	2TB
Standard	4GB	32GB
Web Server	4GB	32GB
Itanium	N/A	2TB

Despite memory being significantly faster than disk, a large multi-CPU system may bottleneck on access to the memory, a situation addressed by the NUMA architecture.

2. NUMA

As we mentioned earlier, advances in CPU clock speed have given way to a trend toward multiple cores per CPU die. That's not to say clock speeds won't increase in the future—they most certainly will—but we're at the point now where it's become increasingly difficult to fully utilize CPU clock speed due to the latency involved in accessing system RAM. On large multiprocessor systems where all CPUs share a common bus to the RAM, the latency of RAM access becomes more and more of an issue, effectively throttling CPU speed and limiting system scalability. A simplified example of this is shown in figure 1.

As we covered earlier, higher amounts of CPU cache will reduce the frequency of trips out to system RAM, but there are obviously limits on the size of the CPU cache, so this only partially addresses the RAM latency issue.

The non-uniform memory access (NUMA) architecture, fully supported by SQL Server, addresses this issue by grouping CPUs together into NUMA nodes, each of which accesses its own RAM, and depending on the NUMA implementation, over its own I/O channel.

Figure 1. CPUs in a symmetric multiprocessing (SMP) system share access to system RAM via a single system bus, thus limiting scalability.

Figure 2. The NUMA architecture increases scalability by grouping CPUs and RAM into nodes.

In contrast, the symmetric multiprocessor architecture has no CPU/RAM segregation, with all CPUs accessing the same RAM over the same shared memory bus. As the number of CPUs and clock speeds increase, the symmetric multiprocessor architecture reaches scalability limits, limits that are overcome by the NUMA architecture; a simplified example appears in figure 2.

While the NUMA architecture localizes RAM to groups of CPUs (NUMA nodes) over their own I/O channels, RAM from other nodes is still accessible. Such memory is referred to as remote memory. In the NUMA architecture, accessing remote memory is more costly (slower) than local memory, and applications that aren't NUMA aware often perform poorly on NUMA systems. Fortunately, SQL Server is fully NUMA aware.^[]

^[] SQL Server 2000 Service Pack 3 and earlier isn't NUMA aware and often performs poorly on NUMA systems.

On large multi-CPU systems running multiple SQL Server instances, each instance can be bound to a group of CPUs and configured with a maximum memory value. In this way, SQL Server instances can be tailored for a particular NUMA node, increasing overall system performance by preventing remote memory access while benefiting from high-speed local memory access.

Hardware NUMA

The NUMA architecture just described is known as hardware NUMA, also referred to as hard NUMA. As the name suggests, servers using hardware NUMA are configured by the manufacturer with multiple system buses, each of which is dedicated to a group of CPUs that use the bus to access their own RAM allocation.

Some hardware vendors supply NUMA servers in interleaved NUMA mode, in which case the system will appear to Windows and SQL Server as an SMP box. Interleaved NUMA is suitable for applications that aren't NUMA optimized. For SQL Server systems, pure NUMA mode should be considered to take advantage of NUMA optimizations if appropriate. The sys.dm_os_memory_clerks Dynamic Management View (DMV) can be used to determine the NUMA mode:

-- TSQL to return the set of active memory clerks
SELECT DISTINCT memory_node_id
FROM sys.dm_os_memory_clerks

If node 0 is the only memory node returned from this query, the server may be configured in interleaved NUMA mode (or isn't NUMA hardware). Servers not configured for hardware NUMA (SMP servers) that contain lots of CPUs may benefit from software-based NUMA, or soft NUMA, which we'll look at next.

Soft NUMA

Unlike hardware NUMA, soft NUMA isn't able to isolate, or affinitize, RAM to groups of CPUs over dedicated buses. However, in some cases system performance may increase by enabling soft NUMA.

On SMP systems without soft NUMA, each SQL Server instance has a single I/O thread and a single LazyWriter thread. Instances experiencing bottlenecks on these resources may benefit from configuring multiple NUMA nodes using soft NUMA, in which case each node will receive its own I/O and LazyWriter threads.

Soft NUMA in SQL Server

Configuring a SQL Server instance for soft NUMA is a two-step process. First, the instance is configured with CPU affinity, as in this example, which configures an instance to use CPUs 0–3:

-- Configure an Instance to use CPUs 0-3
sp_configure 'show advanced options', 1;
RECONFIGURE;
GO
sp_configure 'affinity mask', 15;
RECONFIGURE;
GO

The next step is to configure the NUMA nodes, which is done at a server level—enabling all the defined NUMA nodes to be visible to all SQL instances on the server. A NUMA node is defined in the registry with its corresponding CPUs by adding node keys to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\NodeConfiguration.

Suppose we want to create two NUMA nodes for a given SQL Server instance. In our previous example, we used the affinity mask option to affinitize CPUs 0, 1, 2, and 3. To create two NUMA nodes on those four CPUs, we'd add the registry keys, as shown in table 2.

Table 2. Registry entries used to define NUMA nodes
Key	Type	Name	Value
Node0	DWORD	CPUMask	0x03
Node1	DWORD	CPUMask	0x0c

In this case, CPUs 0 and 1 would be used by the first NUMA node (Node 0) and CPUs 2 and 3 would be used by NUMA Node 1. The hexadecimal equivalents of the binary bit masks are stored in the registry—that is, 0x03 (bit mask 00000011, hex equivalent of 3) for CPUs 0 and 1, and 0x0c (bit mask 00001100, hex equivalent of 12) for CPUs 2 and 3. In this example, the combination of CPU affinity and the registry modifications have provided a SQL Server instance with two soft NUMA nodes.

Other -----------------

- Microsoft SQL Server 2008 R2 : Physical server design - CPU architecture

- Virtualizing Exchange Server 2010 : Benefits of Virtualization, Virtualization Requirements

- Virtualizing Exchange Server 2010 : Virtualization Overview

- Microsoft SQL Server 2008 R2 : Installing SQL Server Clustering (part 3) - Failure of a Node, The Connection Test Program for a SQL Server Cluster

- Microsoft SQL Server 2008 R2 : Installing SQL Server Clustering (part 2) - Installing SQL Server

- Microsoft SQL Server 2008 R2 : Installing SQL Server Clustering (part 1) - Configuring SQL Server Database Disks

- BizTalk 2009 : Dealing with Compressed Files (part 2) - Receiving Zipped Files

- BizTalk 2009 : Dealing with Compressed Files (part 1) - Sending Simple Zipped Files

- Windows Server 2008 Server Core : Managing Removable Storage with the RSM Utility (part 3) - VIEW, REFRESH, INVENTORY

- Windows Server 2008 Server Core : Managing Removable Storage with the RSM Utility (part 2) - DISMOUNT, EJECT, CREATEPOOL

Microsoft SQL Server 2008 R2 : Physical server design - Memory configuration

1. Design for future RAM upgrades

Table 1. Maximum memory for SQL Server 2008 editions

2. NUMA

Figure 1. CPUs in a symmetric multiprocessing (SMP) system share access to system RAM via a single system bus, thus limiting scalability.

Figure 2. The NUMA architecture increases scalability by grouping CPUs and RAM into nodes.

Table 2. Registry entries used to define NUMA nodes