1. Operations
Operations
include many factors, such as the patching and monitoring of the OS and
application, daily maintenance, and troubleshooting. A popular
misconception is that your operating costs will decrease when you start
to virtualize. Nothing could be further from the truth. Chances are that
your costs will actually increase. The reason for the potential
increase is from the addition of virtualization. Virtualization does
bring in an additional skill set requirement for the organization. While
virtualization is not rocket science, it is pretty close. Being able to
balance the needs of the virtual guests against the resources of the
host against the requirements of the users can be a daunting task. The
size of your IT organization and the number and location of servers will
affect the cost of operations. If you have enough staff to learn the
virtualization technology, there may not be a huge impact to the bottom
line. If you don't have that staff, you will most likely be looking for
additional personnel to support your virtualization efforts. The benefit
to getting someone who knows about virtualization is the virtualization
experience that they bring to the table. On top of learning the ins and
outs of the virtualization platform, you now have an additional OS to
patch and maintain. You may need to increase your IT staff to account
for the virtualization component. Your staff does not need to consider
the virtual hosts as just other servers. You must be cautious when you
perform routine maintenance on your virtual host. It is easy to perform
what is considered routine maintenance and then end up harming the
virtual guest.
When you virtualize your
Exchange servers, you still have to take care of an OS and the
application. The virtualized guests still have 100 percent of the daily
operations that their physical counterparts have. The only savings
result from the operational cost of the physical hardware.
You still have to test and
patch your systems. You still have systems that will experience issues,
and you need to spend time troubleshooting. On top of that, you now have
added the hypervisor layer. This layer may or may not be familiar to
your support and engineering staff. You can't just reboot a virtual host
because you feel like it is the best solution for a situation. You now
have to expand your thought process to include the Exchange servers that
are virtualized on that host and take these factors into consideration:
What Exchange services will be affected by shutting down this host?
Exchange virtual guests are on the virtual host, and how will the users be affected when they are shut down?
Do the affected services have a redundant nature?
Are
the redundant services located on the same virtual host or on a
different host? (If they are on the same virtual host, are they really
redundant?)
2. Deciding What to Virtualize
No matter how many Exchange
servers you plan to virtualize, you must do your research as you are
planning the architecture for your environment. Plan your virtual guests
just as though they are physical servers. Then include the additional
overhead for the virtual host. Make sure that you are thinking about the
end product that you will deliver to your users. Consider the possible
differences between the physical and virtualized environment. Will your
user base be as happy with a virtualized environment if it means a
decrease in performance? If you set the expectations, size the
environment appropriately, and test appropriately, there should not be a
noticeable difference for your end users.
As
with any architecture, things that you do can make positive or negative
impacts. With Exchange 2003 and earlier versions, Exchange and the OS
were limited to only 4 GB of physical RAM. By using the /3gb
switch, you can have Exchange take advantage of 3 GB of RAM. Starting
with Exchange 2007, Microsoft changed the Extensible Storage Engine
(ESE). The part of the change that we are concerned about for this
discussion was brought on by the 64-bit nature of Exchange 2007. This
allowed Exchange 2007 to utilize over 32 GB of RAM if needed. This
change allowed the ESE to grab as much RAM as needed to make the
Exchange server more efficient. Since RAM is much faster to access than
regular disk drives, ESE places as much information as possible in
memory. Keep this in mind as you design your virtual guests. Some
applications will use RAM and then release it back to the operating
system after processes are done. Exchange memory allocation is dynamic.
ESE will use all available memory in the system and then return the
memory as other processes need it. If you have a server that has 16 GB
of memory, you can expect that ESE will consume roughly 14 GB of it
until other processes need the resources. The Dynamic Buffer Allocation
(DBA) will manage the cache by comparing the amount of I/O generated by
the databases to the system I/O in terms of hard page faults. If
Exchange is producing more I/O and hitting hard page faults, it will
increase the cache. If Exchange is producing less I/O than the hard page
faults, Exchange will decrease the amount of cache. The primary goal is
to keep ESE cache in balance with the disk cache to reduce the amount
of paging.
Implementing virtualization
involves bringing in new technologies to make the applications operate
smoothly. One popular technique is the ability to overallocate the
physical resources of the virtual host. For example, say your virtual
host has 64 GB of physical RAM. You are going to put six virtual guests
on the server, each with 16 GB of RAM. This is allowed in some
hypervisors, but Microsoft does not recommend it. By doing this, you
have told the virtual guests that they can potentially have a total of
96 GB of RAM in use. Obviously, this cannot happen since you don't have
that much RAM available. But, since all six of your virtual guests will
not utilize this much RAM under normal workloads, it may not impact you.
With this scenario in mind and the fact that the Exchange store.exe process will use any RAM that has been advertised to the guest OS, you will run out of RAM.
Overallocation happens most
often with the amount of RAM and/or the number of processors that are
configured for the virtual guest. Be extremely careful when you are
planning your systems and avoid enabling overallocation of any physical
resources on the virtual host. Be sure to leave your virtual guest
enough resources to handle the processing of the OS and the hypervisor.
While you are planning
your virtualization environment, remember that just because a solution
would be supported by Microsoft does not necessarily mean that it would
be a recommended solution. There are plenty of situations in which it
would make sense to virtualize Exchange, and there are just as many
cases in which virtualization would not make sense. Don't get pushed
into virtualization because it is the "new kid on the block." There is
always a new technology that is the best thing since sliced bread, but
that does not mean that it is the right solution for your environment.
At some point you will have to
decide on your high availability solution. Exchange High Availability
is the automatic switchover of the application services and does not
compromise the integrity of the data. Exchange will automatically detect
the best location for the target of the switchover. If Exchange
determines that there is a healthy copy of the database(s) and a quorum,
Exchange will make the switch to one of the other database(s).
Microsoft will not support a
mixture of high availability solutions. What this means is that you must
choose to either use the replication that is built into Exchange or use
the hypervisor's high availability capabilities. With the introduction
of DAGs, you have a good story for both high availability and site
resilience in the application. Since DAGs are application aware, your
servers are always in control of any Exchange data. Since the Exchange
servers are in constant "discussion" about the status of a database in
the DAG, there should be minimal impact if a server or database goes
down for any reason.
You also have the
opportunity to utilize the virtual host replication. If you decide to
implement your hypervisor's replication, you cannot leverage DAGs on
those virtual hosts. The downside to using virtual host replication is
that Exchange has no clue what is happening with the data and resources.
Although it is technically possible to install Exchange and configure
the DAG on a cluster virtual host, there is a huge potential for data
corruption at the Exchange level. This is why the Exchange team at
Microsoft has chosen not to support this type of deployment. In Table 1,
you will see a breakdown of the differences in clustering technology.
The table compares the major differences between the two technologies.
Table 1. Virtual Host Clustering and Failover vs. Exchange High Availability
Feature | Virtual Host Clustering High Availability | Exchange Mailbox High Availability |
---|
OS heartbeat | Yes | Yes |
Exchange heartbeat | No | Yes |
Copies of the Exchange data | 1 | Minimum of 2 |
Shared storage requirement | Yes | No |
Machine or role failover granularity | No | Yes, down to the database level |
Support hardware VSS | No | Yes |
Support backup from passive copy | No | Yes |
Third-party applications
can have a direct impact on your virtualization design. When planning
your environment, be sure that you have included any applications that
will be coming into the Exchange organization and what impact they will
have. Some applications will hit the CAS or Hub Transport roles, while
others may stress the storage subsystem.
2.1. Exchange Roles
As stated earlier, the ability
is there to virtualize any Exchange 2010 roles except the Unified
Messaging role. You also have the ability to combine Exchange roles. You
may find a need to virtualize the CAS and Hub Transport roles on the
same virtual guest. This is a supported solution by Microsoft. Your
environment may benefit from having the CAS, Hub Transport, and Mailbox
roles virtualized.
We have already said this, but
make sure you do your due diligence when creating the architecture for
your virtualized environment. This is one of the key factors you have to
make sure of before moving forward in virtualizing Exchange
successfully. At the end of the day, you will be measured by the
happiness of your users. It does not matter if you felt the Exchange
deployment was a success if your users are not 100 percent satisfied.
2.2. Performance Counters
To put yourself in
the best possible position for success, gather some information about
your current environment. If you are currently using Exchange, you can
get information from counters like the ones shown in Table 2 for Exchange 2007. If you have Exchange 2003 in your environment, then check out Table 3.
These counters are not a hard-and-fast rule but guidance on what to
look for. If you see that your systems are much higher than the
recommendations, test your system thoroughly with a simulated user load
before you put your production users on the virtualized systems.
Table 2. Exchange 2007 Counters
Category | Object\Counter | Expected Value |
---|
Common Performance Counters (All Exchange Servers) | Processor\% Total | Should be less than 40% average. |
| System\Processor Queue Length (All Instances) | Should be less than 5 (per processor). |
| Network Interface(*)\Bytes Total/Sec | For a 1000-Mbps network adapter, should be below 30–35 Mbps. |
Mailbox Server-Specific Performance Counters | MSExchangeIS Client (*)\RPC Average Latency | Should be less than 30 milliseconds (ms) on average. |
| Process(Microsoft.Exchange.Search.ExSearch)\% Processor time | Should be less than 1% of overall CPU typically and not sustained above 3% |
| MSExchange Store Interface(_Total)\RPC Latency average (msec) | Should be less than 100 ms at all times. |
| MSExchange Store Interface(_Total)\RPC Requests Outstanding | Should be 0 at all times. |
CCR, LCR and SCR Mailbox Server-Specific Performance Counters | MSExchange Replication(*)\CopyQueueLength | Should be less than 10 at all times for CCR and SCR. Should be less than 1 at all times for local continuous replication (LCR). |
CAS Server - Availability | MSExchange Availability Service\Average Time To Process A Free Busy Request | Should always be less than 5. |
CAS Server - OWA | MSExchange OWA\Average Response Time | Should be less than 100 ms at all times. |
Hub Transport - Disk | Logical/Physical Disk(*)\Avg. Disk Sec/Read
Logical/Physical Disk(*)\Avg. Disk Sec/Write | Should be less than 20 ms on average. |
Hub Transport - Transport Database | MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Version Buckets Allocated | Should be less than 200 at all times. |
Hub Transport - Transport Database | MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Log Record Stalls/Sec | Should be less than 10 per second on average. |
Hub Transport - Transport Database | MSExchange Database ==> Instances(edgetransport/Transport Mail Database)\Log Threads Waiting | Should be less than 10 threads waiting on average. |
|
Table 3. Exchange 2003 Counters
Category | Object\Counter | Expected Value |
---|
Mailbox Server - Memory | Free System Page Table Entries/All | Always greater than 8000 pages. Never less than 3500 pages available. |
Mailbox Server - Processor | % Processor Time/All | Average less than 75%. |
Mailbox Server - Log Record Stalls/Sec | Information Store/<Storage Group> | Should be less than 10 per second. |
Mailbox Server - Log Threads Waiting | Information Store/<Storage Group> | Average should be less than 10. |
Front-End Server - Outlook Mobile Access | Last response time | Should be less than 60 seconds. |
Routing Group Connector Bridgehead - SMTP Virtual Server | Categorizer Queue Length | Average should be less than 3, spikes less than 10. |
Routing Group Connector Bridgehead - SMTP Virtual Server | Remote Queue Length | Should be less than 1000. |
These counters are not 100 percent
hardened counters. Use them as a guide to help you figure out whether
your organization could benefit from virtualization. You need to make
sure that any servers you are planning to virtualize are underutilized.
If they are overutilized, you are only going to see a negative impact
once you virtualize.
You must also keep your
user profiles in mind during your planning. You can use profile size
information to help answer the questions about the estimated load on
your virtual guests. This way is much more fluid, but it should put you
in the ballpark.
2.3. Testing
As with any
engineering effort, you need to make sure that you have a testing plan
for the virtualized guests and host. Part of your plan needs to include
testing all your virtual guests at the same time. One of the worst
things you can do is to test only a single server at a time. If you test
only one server at a time, you will probably have very good numbers.
Think about what happens when you turn on all your virtual guests; the
performance will probably head south in a hurry. The bottom line is to
test the entire solution and not pieces of the solution. The solution
should include any third-party applications that are in the environment
as well. Anything that you leave out of the testing cycle could come
back to haunt you when you move to production.
You should use both
Exchange Server Jetstress and Exchange Load Generator to validate your
configuration. Jetstress is used to test the performance of the disk
subsystem. The information that Jetstress gives you should line up with
your performance requirements that were gathered early in the project.
Load Generator will simulate the different client connections that will
be in your environment. You will be able to define the number of each
client connection and how much email traffic they will send and receive.
When using the testing tools, try to emulate the user base that is
currently in the environment. If none of your users use OWA, then don't
put OWA in the test cases. If your organization includes heavy users of
Windows Mobile, make sure that you have included the correct information
to heavily test for Windows Mobile.
Remember: in the virtualized
environment, you should do everything you would normally do in a
physical environment. Don't fall into the trap of thinking that because
this is a virtualized environment, it is a different solution. You are
the only one who will know that these servers are virtualized. The end
users and the first line of the help desk will think these are physical
servers.
Also, please keep in mind
that with the change from Exchange 2007 to Exchange 2010 you will need
to change the focus of the client access services to the Client Access
servers only. You can refer to the Exchange 2010 TechCenter, http://technet.microsoft.com/en-us/library/bb124558(EXCHG.140).aspx, for up-to-date information on server processor and RAM recommendations.
You
don't want to be caught off guard when you get to production. If you
start your pilot and you have not tested the solution, you will find out
pretty quickly. Part of the project will be to set the expectations of
the virtualized environment. Obviously you will be monitoring your
environment closely as you deploy. As you deploy more users to the
virtualized systems, you may find that there is more of a hit on the
systems than testing showed. If this is the case, you may need to add
more virtual guests and virtual hosts to your environment. There is
nothing wrong with this, but make sure that you have informed management
of the possibility before you get in this situation.