Using Operations Manager to Monitor Exchange Server 2010 : Understanding Advanced OpsMgr Concepts

3/31/2011 9:16:04 PM

OpsMgr’s simple installation and relative ease of use often belie the potential complexity of its underlying components. This complexity can be managed with the right amount of knowledge of some of the advanced concepts of OpsMgr design and implementation.

Understanding OpsMgr Deployment Scenarios

As previously mentioned, OpsMgr components can be divided across multiple servers to distribute load and ensure balanced functionality. This separation allows OpsMgr servers to come in four potential “flavors,” depending on the OpsMgr components held by those servers. The four OpsMgr server types are as follows:

Operations database server— An operations database server is simply a member server with SQL Server 2005 installed for the OpsMgr operations database. No other OpsMgr components are installed on this server. The SQL Server 2005 component can be installed with default options and with the system account used for authentication. Data in this database is kept for 4 days by default.
Reporting database server— A reporting database server is simply a member server with SQL Server 2005 and SQL Server Reporting Services installed. This database stores data collected through the monitoring rules for a much longer period than the operations database and is used for reporting and trend analysis. This database requires significantly more drive space than the operations database server. Data in this database is kept for 13 months by default.
Management server— A management server is the communication point for both management consoles and agents. Effectively, a management server does not have a database and is often used in large OpsMgr implementations that have a dedicated database server. Often, in these configurations, multiple management servers are used in a single management group to provide for scalability and to address multiple managed nodes.
All-in-one server— An all-in-one server is effectively an OpsMgr server that holds all OpsMgr roles, including that of the databases. Subsequently, single-server OpsMgr configurations use one server for all OpsMgr operations.

Multiple Configuration Groups

As previously defined, an OpsMgr management group is a logical grouping of monitored servers that are managed by a single OpsMgr SQL database, one or more management servers, and a unique management group name. Each management group established operates completely separately from other management groups, although they can be configured in a hierarchical structure with a top-level management group able to see “connected” lower-level management groups.

The concept of connected management groups allows OpsMgr to scale beyond artificial boundaries and also gives a great deal of flexibility when combining OpsMgr environments. However, certain caveats must be taken into account. Because each management group is an island in itself, each must subsequently be manually configured with individual settings. In environments with a large number of customized rules, for example, such manual configuration would create a great deal of redundant work in the creation, administration, and troubleshooting of multiple management groups.

Deploying Geographic-Based Configuration Groups

Based on the factors outlined in the preceding section, it is preferable to deploy OpsMgr in a single management group. However, in some situations an organization needs to divide its OpsMgr environment into multiple management groups. The most common reason for division of OpsMgr management groups is division along geographic lines. In situations in which wide area network (WAN) links are saturated or unreliable, it might be wise to separate large “islands” of WAN connectivity into separate management groups.

Simply being separated across slow WAN links is not enough reason to warrant a separate management group, however. For example, small sites with few servers would not warrant the creation of a separate OpsMgr management group, with the associated hardware, software, and administrative costs. However, if many servers exist in a distributed, generally well-connected geographical area, that might be a case for the creation of a management group. For example, an organization could be divided into several sites across the United States but decide to divide the OpsMgr environment into separate management groups for East Coast and West Coast, to roughly approximate their WAN infrastructure.

Smaller sites that are not well connected but are not large enough to warrant their own management group should have their event monitoring throttled to avoid being sent across the WAN during peak usage times. The downside to this approach, however, is that the reaction time to critical event response is increased.

Deploying Political or Security-Based Configuration Groups

The less common method of dividing OpsMgr management groups is by political or security lines. For example, it might become necessary to separate financial servers into a separate management group to maintain the security of the finance environment and allow for a separate set of administrators.

Politically, if administration is not centralized within an organization, management groups can be established to separate OpsMgr management into separate spheres of control. This would keep each OpsMgr management zone under separate security models.

As previously mentioned, a single management group is the most efficient OpsMgr environment and provides for the least amount of redundant setup, administration, and troubleshooting work. Consequently, artificial OpsMgr division along political or security lines should be avoided, if possible.

Sizing the OpsMgr Database

Depending on several factors, such as the type of data collected, the length of time that collected data will be kept, or the amount of database grooming that is scheduled, the size of the OpsMgr database will grow or shrink accordingly. It is important to monitor the size of the database to ensure that it does not increase well beyond the bounds of acceptable size. OpsMgr can be configured to monitor itself, supplying advance notice of database problems and capacity thresholds. This type of strategy is highly recommended because OpsMgr could easily collect event information faster than it could get rid of it.

The size of the operations database can be estimated through the following formula:

Number of agents x 5MB x retention days + 1024 overhead = estimated database size

For example, an OpsMgr environment monitoring 1,000 servers with the default 7-day retention period will have an estimated 35GB operations database:

(1000 * 5 * 7) + 1024 = 36024 MB

The size of the reporting database can be estimated through the following formula:

Number of agents x 3MB x retention days + 1024 overhead = estimated database size

The same environment monitoring 1,000 servers with the default 400-day retention period will have an estimated 1.1TB reporting database:

(1000 * 3 * 400) + 1024 = 1201024 MB

It is important to understand that these estimates are rough guidelines only and can vary widely depending on the types of servers monitored, the monitoring configuration, the degree of customization, and other factors.

Defining Capacity Limits

As with any system, OpsMgr includes some hard limits that should be taken into account before deployment begins. Surpassing these limits could be cause for the creation of new management groups and should subsequently be included in a design plan. These limits are as follows:

Operations database— OpsMgr operates through a principle of centralized, rather than distributed, collection of data. All event logs, performance counters, and alerts are sent to a single centralized database, and there can subsequently be only a single operations database per management group. Considering the use of a backup and high-availability strategy for the OpsMgr database is, therefore, highly recommended, to protect it from outage. It is recommended to keep this database with a 50GB limit to improve efficiency and reduce alert latency.
Management servers— OpsMgr does not have a hard-coded limit of management servers per management group. However, it is recommended to keep the environment between three to five management servers. Each management server can support approximately 2,000 managed agents.
Gateway servers— OpsMgr does not have a hard-coded limit of gateway servers per management group. However, it is recommended to deploy a gateway server for every 200 non-trusted domain members.
Agents— Each management server can theoretically support up to 2,000 monitored agents. In most configurations, however, it is wise to limit the number of agents per management server, although the levels can be scaled upward with more robust hardware, if necessary.
Administrative consoles— OpsMgr does not limit the number of instances of the Web and Operations Console; however, going beyond the suggested limit might introduce performance and scalability problems.

Defining System Redundancy

In addition to the scalability built in to OpsMgr, redundancy is built in to the components of the environment. Proper knowledge of how to deploy OpsMgr redundancy and place OpsMgr components correctly is important to the understanding of OpsMgr redundancy. The main components of OpsMgr can be made redundant through the following methods:

Management servers— Management servers are automatically redundant and agents will failover and failback automatically between them. Simply install additional management servers for redundancy. In addition, the RMS server acts as a management server and participates in the fault tolerance.
SQL databases— The SQL database servers hosting the databases can be made redundant using SQL clustering, which is based on Windows clustering. This supports failover and failback.
Root Management Server— The RMS can be made redundant using Windows clustering. This supports failover and failback.

Having multiple management servers deployed across a management group allows an environment to achieve a certain level of redundancy. If a single management server experiences downtime, another management server within the management group will take over the responsibilities for the monitored servers in the environment. For this reason, it might be wise to include multiple management servers in an environment to achieve a certain level of redundancy if high uptime is a priority.

The first management server in the management group is called the Root Management Server. Only one Root Management Server can exist in a management group and it hosts the software development kit (SDK) and Configuration service. All OpsMgr consoles communicate with the management server so its availability is critical. In large-scale environments, the Root Management Server should leverage Microsoft Cluster technology to provide high availability for this component.

Because there can be only a single OpsMgr database per management group, the database is subsequently a single point of failure and should be protected from downtime. Utilizing Windows Server 2008 clustering or third-party fault-tolerance solutions for SQL databases helps to mitigate the risk involved with the OpsMgr database.

Monitoring Nondomain Member Considerations

DMZ, Workgroup, and Non-Trusted domain agents require special configuration; in particular, they require certificates to establish mutual authentication. Operations Manager 2007 requires mutual authentication, that is, the server authenticates to the client and the client authenticates to the server to ensure that the monitoring communications are not hacked. Without mutual authentication, a hacker can execute a man-in-the-middle attack and impersonate either the client or the server. Thus, mutual authentication is a security measure designed to protect clients, servers, and sensitive AD domain information, which is exposed to potential hacking attempts by the all-powerful management infrastructure. However, OpsMgr relies on Active Directory Kerberos for mutual authentication, which is not available to nondomain members.

Note

Edge Transport role servers are commonly placed in the DMZ and are by definition not domain members, so almost every Exchange Server 2010 environment needs to deploy certificate-based authentication.

In the absence of AD, trusts, and Kerberos, OpsMgr 2007 R2 can use X.509 certificates to establish the mutual authentication. These can be issued by any PKI, such as Microsoft Windows Server 2008 Enterprise CA.

Other -----------------

- Using Operations Manager to Monitor Exchange Server 2010 : Understanding OpsMgr Component Requirements

- Using Operations Manager to Monitor Exchange Server 2010 : Understanding How to Use OpsMgr

- BizTalk 2010 Recipes : Messaging and Pipelines - Creating Encryption Pipelines

- BizTalk 2010 Recipes : Messaging and Pipelines - Creating Validation Pipelines

- BizTalk 2010 Recipes : Messaging and Pipelines - Using Send Port Groups

- Managing Windows Server 2008 R2 Disks (part 4) - Working with Virtual Hard Disks

- Managing Windows Server 2008 R2 Disks (part 3) - Creating a Fault-Tolerant Volume Using Diskpart.exe

- Managing Windows Server 2008 R2 Disks (part 2) - Creating Fault-Tolerant Volumes Using Disk Management

- Managing Windows Server 2008 R2 Disks (part 1)

- SharePoint 2010 PerformancePoint Services : Examining Excel Services Reports