OpsMgr’s
simple installation and relative ease of use often belie the potential
complexity of its underlying components. This complexity can be managed
with the right amount of knowledge of some of the advanced concepts of
OpsMgr design and implementation.
Understanding OpsMgr Deployment Scenarios
As previously mentioned,
OpsMgr components can be divided across multiple servers to distribute
load and ensure balanced functionality. This separation allows OpsMgr
servers to come in four potential “flavors,” depending on the OpsMgr
components held by those servers. The four OpsMgr server types are as
follows:
Operations database server—
An operations database server is simply a member server with SQL Server
installed for the OpsMgr operations database. No other OpsMgr
components are installed on this server. The SQL Server component can be
installed with default options and with the system account used for
authentication. Data in this database is kept for 7 days by default.
Reporting database server—
A reporting database server is simply a member server with SQL Server
and SQL Server Reporting Services installed. This database stores data
collected through the monitoring rules for a much longer period than the
operations database and is used for reporting and trend analysis. This
database requires significantly more drive space than the operations
database server. Data in this database is kept for 13 months by default.
Management server—
A management server is the communication point for both management
consoles and agents. Effectively, a management server does not have a
database and is often used in large OpsMgr implementations that have a
dedicated database server. Often, in these configurations, multiple
management servers are used in a single management group to provide for
scalability and to address multiple managed nodes.
All-in-one server—
An all-in-one server is effectively an OpsMgr server that holds all
OpsMgr roles, including that of the databases. Subsequently,
single-server OpsMgr configurations use one server for all OpsMgr
operations.
Multiple Configuration Groups
As
previously defined, an OpsMgr management group is a logical grouping of
monitored servers that are managed by a single OpsMgr SQL database, one
or more management servers, and a unique management group name. Each
management group established operates completely separately from other
management groups, although they can be configured in a hierarchical
structure with a top-level management group able to see “connected”
lower-level management groups.
The concept of
connected management groups allows OpsMgr to scale beyond artificial
boundaries and also gives a great deal of flexibility when combining
OpsMgr environments. However, certain caveats must be taken into
account. Because each management group is an island in itself, each must
subsequently be manually configured with individual settings. In
environments with a large number of customized rules, for example, such
manual configuration would create a great deal of redundant work in the
creation, administration, and troubleshooting of multiple management
groups.
Deploying Geographic-Based Configuration Groups
Based on the factors
outlined in the preceding section, it is preferable to deploy OpsMgr in a
single management group. However, in some situations, an organization
needs to divide its OpsMgr environment into multiple management groups.
The most common reason for division of OpsMgr management groups is
division along geographic lines. In situations in which wide area
network (WAN) links are saturated or unreliable, it might be wise to
separate large “islands” of WAN connectivity into separate management
groups.
Simply being separated across
slow WAN links is not enough reason to warrant a separate management
group, however. For example, small sites with few servers would not
warrant the creation of a separate OpsMgr management group, with the
associated hardware, software, and administrative costs. However, if
many servers exist in a distributed, generally well-connected
geographical area, that might be a case for the creation of a management
group. For example, an organization could be divided into several sites
across the United States but decide to divide the OpsMgr environment
into separate management groups for East Coast and West Coast, to
roughly approximate their WAN infrastructure.
Smaller sites that are
not well connected but are not large enough to warrant their own
management group should have their event monitoring throttled to avoid
being sent across the WAN during peak usage times. The downside to this
approach, however, is that the reaction time to critical event response
is increased.
Deploying Political or Security-Based Configuration Groups
The less-common method
of dividing OpsMgr management groups is by political or security lines.
For example, it might become necessary to separate financial servers
into a separate management group to maintain the security of the finance
environment and allow for a separate set of administrators.
Politically,
if administration is not centralized within an organization, management
groups can be established to separate OpsMgr management into separate
spheres of control. This would keep each OpsMgr management zone under
separate security models.
As previously
mentioned, a single management group is the most efficient OpsMgr
environment and provides for the least amount of redundant setup,
administration, and troubleshooting work. Consequently, artificial
OpsMgr division along political or security lines should be avoided, if
possible.
Sizing the OpsMgr Database
Depending on several factors,
such as the type of data collected, the length of time that collected
data will be kept, or the amount of database grooming that is scheduled,
the size of the OpsMgr database will grow or shrink accordingly. It is
important to monitor the size of the database to ensure that it does not
increase well beyond the bounds of acceptable size. OpsMgr can be
configured to monitor itself, supplying advance notice of database
problems and capacity thresholds. This type of strategy is highly
recommended because OpsMgr could easily collect event information faster
than it could get rid of it.
The size of the operations database can be estimated through the following formula:
Number of agents x 5MB x retention days + 1024 overhead = estimated database size
For example, an OpsMgr
environment monitoring 1,000 servers with the default 7-day retention
period will have an estimated 35GB operations database:
(1000 * 5 * 7) + 1024 = 36024 MB
The size of the reporting database can be estimated through the following formula:
Number of agents x 3MB x retention days + 1024 overhead = estimated database size
The same environment
monitoring 1,000 servers with the default 400-day retention period will
have an estimated 1.1TB reporting database:
(1000 * 3 * 400) + 1024 = 1201024 MB
It is important to
understand that these estimates are rough guidelines only and can vary
widely depending on the types of servers monitored, the monitoring
configuration, the degree of customization, and other factors.
Defining Capacity Limits
As with any system, OpsMgr
includes some hard limits that should be taken into account before
deployment begins. Surpassing these limits could be cause for the
creation of new management groups and should subsequently be included in
a design plan. These limits are as follows:
Operations database—
OpsMgr operates through a principle of centralized, rather than
distributed, collection of data. All event logs, performance counters,
and alerts are
sent to a single, centralized database, and there can subsequently be
only a single operations database per management group. Considering the
use of a backup and high-availability strategy for the OpsMgr database
is, therefore, highly recommended, to protect it from outage. It is
recommended to keep this database with a 50GB limit to improve
efficiency and reduce alert latency.
Management servers—
OpsMgr does not have a hard-coded limit of management servers per
management group. However, it is recommended to keep the environment
between three to five management servers. Each management server can
support approximately 2,000 managed agents.
Gateway servers—
OpsMgr does not have a hard-coded limit of gateway servers per
management group. However, it is recommended to deploy a gateway server
for every 200 nontrusted domain members.
Agents—
Each management server can theoretically support up to 2,000 monitored
agents. In most configurations, however, it is wise to limit the number
of agents per management server, although the levels can be scaled
upward with more robust hardware, if necessary.
Administrative consoles—
OpsMgr does not limit the number of instances of the Web and Operations
Console; however, going beyond the suggested limit might introduce
performance and scalability problems.
Defining System Redundancy
In addition to the
scalability built in to OpsMgr, redundancy is built in to the components
of the environment. Proper knowledge of how to deploy OpsMgr redundancy
and place OpsMgr components correctly is important to the understanding
of OpsMgr redundancy. The main components of OpsMgr can be made
redundant through the following methods:
Management servers—
Management servers are automatically redundant and agents will failover
and failback automatically between them. Simply install additional
management servers for redundancy. In addition, the RMS system acts as a
management server and participates in the fault tolerance.
SQL databases—
The SQL database servers hosting the databases can be made redundant
using SQL clustering, which is based on Windows clustering. This
supports failover and failback.
Root Management Server— The RMS can be made redundant using Windows clustering. This supports failover and failback.
Having multiple
management servers deployed across a management group allows an
environment to achieve a certain level of redundancy. If a single
management server experiences downtime, another management server within
the management group will take over the responsibilities for the
monitored servers in the environment. For this reason, it might be wise
to include multiple management servers in an environment to achieve a
certain level of redundancy if high uptime is a priority.
The
first management server in the management group is called the Root
Management Server. Only one Root Management Server can exist in a
management group and it hosts the software development kit (SDK) and
Configuration service. All OpsMgr consoles communicate with the
management server so its availability is critical. In large-scale
environments, the Root Management Server should leverage Microsoft
Cluster technology to provide high availability for this component.
Because there can be only a
single OpsMgr database per management group, the database is
subsequently a single point of failure and should be protected from
downtime. Utilizing Windows Server 2008 R2 clustering or third-party
fault-tolerance solutions for SQL databases helps to mitigate the risk
involved with the OpsMgr database.
Monitoring Nondomain Member Considerations
DMZ, Workgroup,
and Nontrusted Domain Agents require special configuration; in
particular, they require certificates to establish mutual
authentication. Operations Manager 2007 R2 requires mutual
authentication, that is, the server authenticates to the client and the
client authenticates to the server, to ensure that the monitoring
communications are not hacked. Without mutual authentication, it is
possible for a hacker to execute a man-in-the-middle attack and
impersonate either the client or the server. Thus, mutual authentication
is a security measure designed to protect clients, servers, and
sensitive Active Directory domain information, which is exposed to
potential hacking attempts by the all-powerful management
infrastructure. However, OpsMgr relies on Active Directory Kerberos for
mutual authentication, which is not available to nondomain members.
Note
Workgroup servers,
public web servers, and Microsoft Exchange Edge Transport role servers
are commonly placed in the DMZ and are for security reasons not domain
members, so almost every Windows Server 2008 R2 environment will need to
deploy certificate-based authentication.