Agent Restart Recovery
Agents will heartbeat every
60 seconds by default, contacting their management server to check for
new rules and upload data. On the Root Management Server, there is a
Health Service Watcher corresponding to each managed agent. If the
Health Service Watcher for an agent detects three missed heartbeats in a
row (that is, 3 minutes without a heartbeat), the Health Service
Watcher executes a pair of diagnostics:
First, the Health
Service Watcher attempts to ping the agent.
Second, the Health Service
Watcher checks to see if the Health Service is running on the agent.
An alert is then generated for
each of the diagnostics if they failed. If the agent is reachable via
ping but the Health Service is stopped, there is a recovery to restart
the Health Service. This allows the agent to recover automatically from
stopped agent conditions.
The Restart Health
Service Recovery is disabled by default. To enable the functionality, an
override can be created for the Health Service Watcher objects. To
enable the recovery, execute the following steps:
1. | Open the
Operations Manager 2007 R2 console.
|
2. | Select the Authoring space.
|
3. | Expand the Management Pack Objects node.
|
4. | Select the Monitors node.
|
5. | Select View, Scope.
|
6. | Type health service watcher in the Look For
field and click the View All Targets option button.
|
7. | Select the Health Service Watcher target. Don’t pick
the ones with additional information in parentheses.
|
8. | Click OK.
|
9. | Type Heartbeat Failure in the Look For field
and click Find Now.
|
10. | Right-click the Health Service Heartbeat Failure
aggregate rollup node and select Overrides, Override Recovery, Restart
Health Service, and For All Objects of Class: Health Service Watcher.
|
11. | Check the Override box next to Enabled and set the
value to True.
|
12. | In the
Select Destination Management Pack pull-down menu, select the
appropriate override management pack. If none exists, create a new
management pack named “Operations Manager MP Overrides” by clicking New.
Note
Never use the Default
Management Pack for overrides. Always create an override management pack
that corresponds to each imported management pack.
|
13. | Click OK
to save the override.
|
Now if the Health Service is
stopped on an agent, the Root Management Server will automatically
attempt to restart it.