OpsMgr’s simple installation and
relative ease of use often belie the potential complexity of its
underlying components. This complexity can be managed with the right
amount of knowledge of some of the advanced concepts of OpsMgr design
and implementation.
Understanding
OpsMgr Deployment Scenarios
As previously mentioned, OpsMgr components
can be divided across multiple servers to distribute load and ensure
balanced functionality. This separation allows OpsMgr servers to come in
four potential “flavors,” depending on the OpsMgr components held by
those servers. The four OpsMgr server types are as follows:
Operations
database server— An operations database
server is simply a member server with SQL Server 2005 installed for the
OpsMgr operations database. No other OpsMgr components are installed on
this server. The SQL Server 2005 component can be installed with default
options and with the system account used for authentication. Data in
this database is kept for 4 days by default.
Reporting database server— A reporting database server is simply a member
server with SQL Server 2005 and SQL Server Reporting Services installed.
This database stores data collected through the monitoring rules for a
much longer period than the operations database and is used for
reporting and trend analysis. This database requires significantly more
drive space than the operations database server. Data in this database
is kept for 13 months by default.
Management server— A management server is the communication point for
both management consoles and agents. Effectively, a management server
does not have a database and is often used in large OpsMgr
implementations that have a dedicated database server. Often, in these
configurations, multiple management servers are used in a single
management group to provide for scalability and to address multiple
managed nodes.
All-in-one server—
An all-in-one server is effectively an OpsMgr server that holds all
OpsMgr roles, including that of the databases. Subsequently,
single-server OpsMgr configurations use one server for all OpsMgr
operations.
Multiple
Configuration Groups
As previously defined, an
OpsMgr management group is a logical grouping of monitored servers that
are managed by a single OpsMgr SQL database, one or more management
servers, and a unique management group name. Each management group
established operates completely separately from other management groups,
although they can be configured in a hierarchical structure with a
top-level management group able to see “connected” lower-level
management groups.
The concept
of connected management groups allows OpsMgr to scale beyond artificial
boundaries and also gives a great deal of flexibility when combining
OpsMgr environments. However, certain caveats must be taken into
account. Because each management group is an island in itself, each must
subsequently be manually configured with individual settings. In
environments with a large number of customized rules, for example, such manual configuration would
create a great deal of redundant work in the creation, administration,
and troubleshooting of multiple management groups.
Deploying
Geographic-Based Configuration Groups
Based on the
factors outlined in the preceding section, it is preferable to deploy
OpsMgr in a single management group. However, in some situations an
organization needs to divide its OpsMgr environment into multiple
management groups. The most common reason for division of OpsMgr
management groups is division along geographic lines. In situations in
which wide area network (WAN) links are saturated or unreliable, it
might be wise to separate large “islands” of WAN connectivity into
separate management groups.
Simply being
separated across slow WAN links is not enough reason to warrant a
separate management group, however. For example, small sites with few
servers would not warrant the creation of a separate OpsMgr management
group, with the associated hardware, software, and administrative costs.
However, if many servers exist in a distributed, generally
well-connected geographical area, that might be a case for the creation
of a management group. For example, an organization could be divided
into several sites across the United States but decide to divide the
OpsMgr environment into separate management groups for East Coast and
West Coast, to roughly approximate their WAN infrastructure.
Smaller sites that are not
well connected but are not large enough to warrant their own management
group should have their event monitoring throttled to avoid being sent
across the WAN during peak usage times. The downside to this approach,
however, is that the reaction time to critical event response is
increased.
Deploying
Political or Security-Based Configuration Groups
The less common method
of dividing OpsMgr management groups is by political or security lines.
For example, it might become necessary to separate financial servers
into a separate management group to maintain the security of the finance
environment and allow for a separate set of administrators.
Politically, if
administration is not centralized within an organization, management
groups can be established to separate OpsMgr management into separate
spheres of control. This would keep each OpsMgr management zone under
separate security models.
As previously
mentioned, a single management group is the most efficient OpsMgr
environment and provides for the least amount of redundant setup,
administration, and troubleshooting work. Consequently, artificial
OpsMgr division along political or security lines should be avoided, if
possible.
Sizing the OpsMgr
Database
Depending on
several factors, such as the type of data collected, the length of time
that collected data will be kept, or the amount of database grooming
that is scheduled, the size of the OpsMgr database will grow or shrink
accordingly. It is important to monitor the size of the database to
ensure that it does not increase well beyond the bounds of acceptable
size. OpsMgr can be configured to monitor itself, supplying advance
notice of database problems and capacity thresholds. This type of
strategy is highly recommended because OpsMgr could easily collect event
information faster than it could get rid of it.
The size of the
operations database can be estimated through the following formula:
Number of agents x 5MB x retention days + 1024 overhead = estimated database size
For example, an OpsMgr
environment monitoring 1,000 servers with the default 7-day retention
period will have an estimated 35GB operations database:
(1000 * 5 * 7) + 1024 = 36024 MB
The size of the
reporting database can be estimated through the following formula:
Number of agents x 3MB x retention days + 1024 overhead = estimated database size
The same environment
monitoring 1,000 servers with the default 400-day retention period will
have an estimated 1.1TB reporting database:
(1000 * 3 * 400) + 1024 = 1201024 MB
It is important to
understand that these estimates are rough guidelines only and can vary
widely depending on the types of servers monitored, the monitoring
configuration, the degree of customization, and other factors.
Defining Capacity
Limits
As with any
system, OpsMgr includes some hard limits that should be taken into
account before deployment begins. Surpassing these limits could be cause
for the creation of new management groups and should subsequently be
included in a design plan. These limits are as follows:
Operations
database— OpsMgr operates through a
principle of centralized, rather than distributed, collection of data.
All event logs, performance counters, and alerts are sent to a single
centralized database, and there can subsequently be only a single
operations database per management group. Considering the use of a
backup and high-availability strategy for the OpsMgr database is,
therefore, highly recommended, to protect it from outage. It is
recommended to keep this database with a 50GB limit to improve
efficiency and reduce alert latency.
Management servers—
OpsMgr does not have a hard-coded limit of management servers per
management group. However, it is recommended to keep the environment
between three to five management servers. Each management server can
support approximately 2,000 managed agents.
Gateway servers— OpsMgr does not have a hard-coded limit of
gateway servers per management group. However, it is recommended to
deploy a gateway server for every 200 non-trusted domain members.
Agents— Each management server can
theoretically support up to 2,000 monitored agents. In most
configurations, however, it is wise to limit the number of agents per
management server, although the levels can be scaled upward with more
robust hardware, if necessary.
Administrative consoles—
OpsMgr does not limit the number of instances of the Web and Operations
Console; however, going beyond the suggested limit might introduce
performance and scalability problems.
Defining System
Redundancy
In addition to
the scalability built in to OpsMgr, redundancy is built in to the
components of the environment. Proper knowledge of how to deploy OpsMgr
redundancy and place OpsMgr components correctly is important to the
understanding of OpsMgr redundancy. The main components of OpsMgr can be
made redundant through the following methods:
Management
servers— Management servers are
automatically redundant and agents will failover and failback
automatically between them. Simply install additional management servers
for redundancy. In addition, the RMS server acts as a management server
and participates in the fault tolerance.
SQL databases— The SQL database servers hosting the
databases can be made redundant using SQL clustering, which is based on
Windows clustering. This supports failover and failback.
Root Management Server— The RMS can be made redundant using Windows
clustering. This supports failover and failback.
Having multiple
management servers deployed across a management group allows an
environment to achieve a certain level of redundancy. If a single
management server experiences downtime, another management server within
the management group will take over the responsibilities for the
monitored servers in the environment. For this reason, it might be wise
to include multiple management servers in an environment to achieve a
certain level of redundancy if high uptime is a priority.
The first management server
in the management group is called the Root Management Server. Only one
Root Management Server can exist in a management group and it hosts the
software development kit (SDK) and Configuration service. All OpsMgr
consoles communicate with the management server so its availability is
critical. In large-scale environments, the Root Management Server should
leverage Microsoft Cluster technology to provide high availability for
this component.
Because there can be
only a single OpsMgr database per management group, the database is
subsequently a single point of failure and should be protected from
downtime. Utilizing Windows Server 2008 clustering or third-party
fault-tolerance solutions for SQL databases helps to mitigate the risk
involved with the OpsMgr database.
Monitoring
Nondomain Member Considerations
DMZ, Workgroup, and Non-Trusted domain agents
require special configuration; in particular, they require certificates
to establish mutual authentication. Operations Manager 2007 requires
mutual authentication, that is, the server authenticates to the client
and the client authenticates to the server to ensure that the monitoring
communications are not hacked. Without mutual authentication, a hacker
can execute a man-in-the-middle attack and impersonate either the client
or the server. Thus, mutual authentication is a security measure
designed to protect clients, servers, and sensitive AD domain
information, which is exposed to potential hacking attempts by the
all-powerful management infrastructure. However, OpsMgr relies on Active
Directory Kerberos for mutual authentication, which is not available to
nondomain members.
Note
Edge Transport role
servers are commonly placed in the DMZ and are by definition not domain
members, so almost every Exchange Server 2010 environment needs to
deploy certificate-based authentication.
In the absence of AD,
trusts, and Kerberos, OpsMgr 2007 R2 can use X.509 certificates to
establish the mutual authentication. These can be issued by any PKI,
such as Microsoft Windows Server 2008 Enterprise CA.