Manage Alert and Notification Volume

Topics marked with * relate to features available only in vFabric Hyperic.

Manage Alerting for Optimal Visibility into Problems

The purpose of alerting is to speed the process of detecting and resolving problems. Rapid detection and response can be compromised when multiple alerts fire as a result of the same problem, or if responders are inundated by repetitive alert notifications. Excessive alert and notification are less likely when:

  • A given problem or root cause results in one, rather than many, alerts.

  • An alert status of "unfixed" indicates a problem that still exists and needs attention, rather than a transient issue that occurred, and then went away.

  • A single problem doesn't result in a firestorm of redundant notifications.

The following sections describe options for controlling the volume of alerts and notifications.

Prevent Multiple Alerts for the Same Problem

When the volume of fired alerts is high, prioritizing and resolving problems is harder. You can reduce the overall volume of fired alerts without sacrificing visibility if you limit the number of times a given alert definition fires an alert for the same incident.

  • Use repeating escalations - Assign an escalation that repeats until the alert is fixed.  An alert is in escalation cannot re-fire. The use of repeating escalations for all alerts is highly recommended and the best way to control alert volume in Hyperic.

  • Fire one alert then disable the definition - You can configure a alert definition to fire once and disable itself until that alert is marked fixed. When the alert is marked "fixed" the alert definition is re-enabled. Note that if you have vFabric Hyperic, you can can define an associated recovery alert to automatically fix the alert when the triggering condition is no longer true.

Disable all Alert Notifications

If the volume of notifications exceeds manageable levels you can disable alert notifications globally.  This option stops all alert notifications immediately including those resulting from escalations in process.

  1. Click the Administration tab.

  2. Click HQ Server Settings.

  3. In the "Global Alert Properties" section, click the Alert Notifications OFF or ON control.

The change takes effect immediately. No alert notifications will be issued when OFF is selected. Escalations currently in progress will be terminated.

Hierarchical Alerting Prevents a Cascade of Alerts in Resource Hierarchies

Available only in vFabric Hyperic


Hierarchical alerting prevents a single root cause in the same resource hierarchy from causing a cascade of alerts to fire.

When hierarchical alerting is enabled, the alert evaluation process takes into account the availability and alert status of a resource's parent. Specifically, when an agent reports that a resource with an active alert definition is unavailable, HQ checks the availability of the resource's  parent in the resource hierarchy. Hyperic will fire an alert for the child resource only if:

  • the parent is available, or

  • the parent is unavailable, and there is not an enabled, single-condition alert definition on its Availability metric.

Hierarchical alerting takes advantage of Hyperic's knowledge of the platform-server-service resource hierarchy, obtained via the auto-discovery process. For example, before firing an alert for a service, Hyperic checks the availability and alert status of its parent server. Similarly, before firing an alert for a server, Hyperic checks the availability and alert status of its parent platform.

Hierarchical alerting is a global behavior that applies to all resources in inventory; it is enabled by default. You enable or disable hierarchical alerting in the "Global Alert Settings" section of the HQ Server Settings page, accessible from the Administration tab in the vFabric Hyperic user interface.The change takes effect immediately.

Configure Network Host Dependencies for Hierarchical Alerting

Available only in vFabric Hyperic

You can extend the reach of hierarchical alerting beyond the basic platform-server-service hierarchy to top level platforms - network devices or virtual hosts upon which operating system platforms depend.

To enable Hyperic to consider a top-level platform's availability and alert status before firing an alert for a dependent resources, you must define the relationship between a top-level platform and the operating system platforms that depend on it. To do so, you use the Network Host Dependency Manager, available in the "Plugins" section on the Administration tab of the vFabric Hyperic user interface. The help page for the Network Host Dependency Manager provides instructions.

vSphere Resource Relationships in vFabric Hyperic

If you manage vSphere resources using the new vSphere plugin, do not use the Network Host Dependency Manager to configure dependencies for vSphere resources. vSphere resource types will be removed from the Network Host Dependency Manager pulldown menus in a future release. For information about the vSphere virtual resource hierarchy, see vSphere.

Set a Notification Throttle

Available only in vFabric Hyperic

You can configure the Hyperic Server to throttle back alert notifications in the event of an alert storm. You configure the maximum number of notifications that Hyperic will issue within a fifteen second interval. When the threshold is reached, Hyperic stops sending individual alert notifications, and instead, sends a summary of alert activity to  designated recipients every ten minutes. When the volume of notifications falls below the specified threshold, Hyperic resumes sending individual notifications.

Notification throttling is disabled by default. You configure it on the HQ Server Settings page, available from the Administration tab.

Enable or Disable all Alert Definitions

You can disable or enable alert definitions globally, if you want to turn alerting on or off for all resources in inventory.

  1. Click the Administration tab.

  2. Click HQ Server Settings.

  3. In the "Global Alert Properties" section, click the Alerts ON or OFF control.

The change takes effect immediately. No alerts will be fired for any resource when OFF is selected. Escalations currently in progress will be completed.