In looking at some of reasons for
virtualization’s popularity, the preceding sections identified the
concept of contention, the capability to better use previously
underutilized physical resources in a server in order to reduce the
total number of physical servers deployed. For the purposes of this
discussion, we can split the idea of contention into two parts: good
contention and bad contention.
Good Contention
Good contention is straightforward: It
enables you to see positive benefits from virtualizing your servers,
ultimately resulting in less time and money spent on deploying and
maintaining your physical server estate.
For example, if the average CPU utilization of 6
single CPU physical servers was 10% and none of them had concurrent peak
CPU usage periods, then I would feel comfortable virtualizing those 6
servers and running them as a single server with a single CPU — the
logic being 6× 10% = 60%, and therefore less than the capacity of a
single server with a single CPU. I’d want to make sure there was
sufficient physical memory and storage system performance available for
all 6 virtual servers, but ultimately the benefit would be the ability
to retire 5 physical servers.
That’s a very simple example but one that most
businesses can readily understand. CPU utilization is an absolute number
that is usually a good reflection of how busy the server is.
Conversely, sizing the server’s memory is something to which you can’t
apply such an easy consolidation methodology to. Instead, you usually
need to determine the total memory requirement of all the virtual
servers you want to run on a host server and then ensure you have more
than that amount of physical memory in the host. However, VMware’s
hypervisor complicates that by offering a memory de-duplication feature
that allows duplicate memory pages to be replaced with a link to a
single memory page shared by several virtual servers, but
over-estimating the benefit this technology could deliver wrong can
result in the performance issues you tried to avoid. For SQL Server
environments that are dependent on access to large amounts of physical
memory, trusting these hypervisor memory consolidation technologies
still requires testing, so their use in sizing exercises should be
minimized.
Bad Contention
Not all contention is good. In fact,
unless you plan well you’re more likely to have bad contention than good
contention. To understand bad contention, consider the CPU utilization
example from the preceding section: 6 servers with average CPU
utilization values of 10% being consolidated onto a single CPU host
server. This resulted in an average CPU utilization for the host server
of around 60%. Now imagine if the average CPU utilization for two of the
virtual servers jumps from 10% to 40%. As a consequence, the total CPU requirement
has increased from 60% to 120%. Obviously, the total CPU utilization
cannot be 120%, so you have a problem. Fortunately, resolving this
scenario is one of the core functions of hypervisor software: How can it
look like CPU utilization is 120%, for example, when actually only 100%
is available?
Where does the missing resource come from?
Behaviors such as resource sharing, scheduling, and time-slicing are
used by hypervisors to make each virtual server appear to have full
access to the physical resources that it’s allocated all of the time.
Under the hood, however, the hypervisor is busy managing resource
request queues — for example, “pausing” virtual servers until they get
the CPU time they need, or pre-empting a number of requests on physical
cores while the hypervisor waits for another resource they need to
become available.
How much this contention affects the performance
of virtual servers depends on how the hypervisor you’re using works. In a
worst-case scenario using VMware, a virtual server with a large number
of virtual CPUs can be significantly affected if running alongside a
number of virtual servers with small numbers of virtual CPUs; this is
due to VMware’s use of their co-scheduling algorithm to handle CPU
scheduling. Seeing multi-second pauses of the larger virtual server
while it waits for sufficient physical CPU resources is possible in the
worst-case scenarios, indicating not only the level of attention that
should be paid to deploying virtual servers, but also the type of
knowledge you should have if you’re going to be using heavily utilized
virtual environments.
Although that example of how VMware can affect
performance is an extreme example, it does show how bad contention
introduces unpredictable latency. Previously, on a host server with
uncontended resources, you could effectively assume that any virtual
server’s request for a resource could be fulfilled immediately as the
required amounts of resource were always available. However, when the
hypervisor has to manage contention, a time penalty for getting access
to the resource gets introduced. In effect, “direct” access to the
physical resource by the virtual server can no longer be assumed.
“Direct” is in quotes because although virtual
servers never directly allocate to themselves the physical resources
they use in an uncontended situation, the hypervisor does not have
difficulty finding the requested CPU time and memory resources they
require; the DBA can know that any performance penalty caused by
virtualization is likely to be small but, most important, consistent. In
a contended environment, however, the resource requirements of other
virtual servers now have the ability to affect the performance of other
virtual servers, and that becomes un-predictable.
Demand-Based Memory Allocation
I mentioned earlier that some
hypervisors offer features that aim to reduce the amount of physical
memory needed in a virtual environment’s host servers. Memory is still
one of the most expensive components of a physical server, not so much
because of the cost per GB but because of the number of GBs that modern
software requires in servers. It’s not surprising therefore that
virtualization technologies have tried to ease the cost of servers by
making what memory is installed in the server go farther. However, there
is no such thing as free memory; and any method used to make memory go
farther will affect performance somewhere. The goal is to know where
that performance impact can occur with the least noticeable effects.
Demand-based memory allocation works on the
assumption that not all the virtual servers running on a host server
will need all their assigned memory all the time. For example, my laptop
has 4GB of memory but 2.9GB of it is currently free. Therefore, if it
were a virtual server, the hypervisor could get away with granting me
only 1.1GB, with the potential for up to 4GB when I need it. Scale that
out across a host server running 20 virtual servers and the potential to
find allocated but un-required memory could be huge.
The preceding scenario is the basis of
demand-based memory allocation features in modern hypervisors. While
VMware and Hyper-V have different approaches, their ultimate aim is the
same: to provide virtual servers with as much memory as they need but no
more than they need. That way, unused memory can be allocated to extra
virtual servers that wouldn’t otherwise be able to run at all because of
memory constraints.
In an ideal situation, if several virtual servers
all request additional memory at the same time, the host server would
have enough free physical memory to give them each all they need. If
there’s not enough, however, then the hypervisor can step in to reclaim
and re-distribute memory between virtual servers. It may be, for
example, that some have been configured to have a higher priority than
others over memory in times of shortages; this is called weighting
and is described in the next section. The rules about how much memory
you can over-provision vary by hypervisor, but the need to reclaim and
re-distribute memory is certainly something VMware’s software and
Microsoft’s Hyper-V could have to do.
Re-claiming and re-distributing memory ultimately
means taking it away from one virtual server to give to another, and
from a virtual server that was operating as though the memory allocated
to it was all theirs, and it may well have been being used by
applications. When this reclamation has to happen, a SQL Server DBA’s
worst nightmare occurs, and the balloon driver we mentioned earlier has
to inflate.
To summarize its purpose, when more memory is required than is
available in the host server, the hypervisor will have to re-allocate
physical memory between virtual servers. It could do this to ensure that
any virtual servers that are about to be started have the configured
minimum amount of memory allocated to them, or if any resource
allocation weightings between virtual servers need to be maintained, for
example, if a virtual server with a high weighting needs more memory.
Resource weightings are described in the next section.
Different hypervisors employ slightly different
methods of using a balloon driver, but the key point for DBAs here is
that SQL Server always responds to a low Available Megabytes value,
which the inflating of a balloon driver can cause. SQL Server’s response
to this low-memory condition is to begin reducing the size of the
buffer pool and release memory back to Windows, which after a while will
have a noticeable effect on database server performance.
The advice from the virtualization vendors about
how to configure their demand-based memory allocation technology for SQL
Server varies. Hyper-V is designed to be cautious with memory
allocations and will not allow the minimum amount of memory a virtual
server needs to become unavailable, while VMware allows the memory in a
host server to be over-committed. Because of the potential performance
issues this can cause, VMware does not recommend running SQL Server on a
host that’s had its memory over-committed.
Weighting
Finally, when there is resource
contention within a host server, the virtualization administrator can
influence the order in which physical resources are protected, reserved,
or allocated. This is determined by a weighting value, and it is used
in various places throughout a virtualization environment — especially
one designed to operate with contention. For example, an environment
might host virtual servers for production, development, and occasionally
testing. The priority may be for production to always have the
resources it needs at the expense of the development servers if need be.
However, the test servers, while only occasionally used, might have a
higher priority than the development servers, and therefore have a
weighting lower than the production servers but higher than the
development servers.