vRealize Operations Manager provides groups of metrics for selected hosts. Each group displays the most relevant metrics for the host to help you monitor your environment.

To display metric groups, select a host in the Environment Overview, and then select the All Metrics tab.

To display the metrics contained within a group, click the plus sign next to the group. You can double-click a group to populate the chart window with a separate chart for each of the metrics in the group. In the screenshot above, the metrics of the memory group populate the chart window.

CPU Metric Group

Metric

Description

CPU|CPU contention (%)

This metric shows the percentage of time the VMs in the ESXi hosts are unable to run because they are contending for access to the physical CPUs. The number shown is the average number for all VMs. The number will be lower than the highest number experienced by the VM that is most impacted by CPU contention.

Use this metric to verify if the host can serve all its VMs efficiently. Low contention means that the VM can access everything it demands to run smoothly. It means that the infrastructure is providing good service to the application team.

When using this metric, ensure that the number is within your expectation. Look at both the relative number and the absolute number. Relative means a drastic change in value, meaning that the ESXi is unable to serve the VMs. Absolute means that the real value itself is high. Investigate why the number is high. One factor that impacts this metric is CPU Power Management. If CPU Power Management clocks down the CPU speed from 3 GHz to 2 GHz, the reduction in speed is accounted for because it shows that the VM is not running at full speed.

This metric is calculated in the following way: cpu|capacity_contention / (200 * summary|number_running_vcpus)

CPU|Demand (%)

This metric shows the amount of CPU resources a VM would use if there were no CPU contention or CPU limit. This metric represents the average active CPU load for the past five minutes.

Keep this number below 100% if you set the power management to maximum.

This metric is calculated in the following way: ( cpu.demandmhz / cpu.capacity_provisioned)*100 .

Summary|Number of running VMs

This metric shows the number of running VMs at a given point in time. The data is sampled every five minutes.

A large number of running VMs might be a reason for CPU or memory spikes because more resources are used in the host. The number of running VMs gives you a good indicator of how many requests the ESXi host must juggle. Powered off VMs are not included because they do not impact ESXi performance. A change in the number of running VMs can contribute to performance problems. A high number of running VMs in a host also means a higher concentration risk, because all the VMs will fail if the ESXi crashes.

Use this metric to look for a correlation between spikes in the running VMs and spikes in other metrics such as CPU contention, or memory contention.

Summary|Number of vMotions

This metric shows the number of times a live migration (vMotion) with no VM downtime or service disruption took place in a host in the last (x) minutes.

The number of vMotions is a good indicator of stability. In a healthy environment, this number is stable and relatively low.

When using this metric, look for a correlation between vMotions and spikes in other metrics such as CPU contention and memory contention. Although the vMotion should not create any spikes, it is highly likely that some spikes in memory usage contention, and CPU demand and contention are experienced.

Memory Metric Group

Metric

Description

Memory|Balloon (KB)

This metric shows the total amount of memory currently used by the VM memory control.

Use this metric to monitor how much VM memory the ESXi has reclaimed through memory ballooning.

The presence of ballooning indicates that the ESXi has been under memory pressure. ESXi activates ballooning when its consumed memory reaches a specific threshold. For example, in vRealize Operations Manager 6.0, the threshold is >98%.

When using this metric, verify if the size of the ballooning is increasing. An increase in ballooning indicates that the lack of memory is not a one time occurrence, and that the memory shortage is worsening. Look for memory fluctuations which indicate that the VM required the ballooned out page. If the VM requests a ballooned out page, this translates into a memory performance problem for the VM because the page has to be returned from the disk.

When the balloon target value is greater than the value shown by the metric, it means that there is more available memory that can be reclaimed.

Memory|Contention (%)

This metric shows the percentage of time VMs are waiting to access swapped memory.

Use this metric to monitor ESXi memory swapping. A high value indicates that the ESXi is running low on memory, and a large amount of memory is being swapped.

Memory|Usage (%)

This metric shows the amount of physical memory actively used. The memory usage is displayed as a percentage of the total configured or available memory. This metric maps to the Consumed counter in vCenter.

When the metric displays a high value, it indicates that the ESXi is using a large percentage of available memory. Check other memory-related metrics to see if the ESXi requires more memory.

Network Metric Group

Metric

Description

Network I/O | Aggregate of all instances | Packet Dropped (%)

This metric shows the percentage of received and transmitted packets dropped in the collection interval.

Use this metric to monitor the reliability and performance of the ESXi network. A high value indicates that the network is not reliable and performance decreases.

Network I/O | Aggregate of all instances | Packet Received per second

This metric shows the number of packets received in the collection interval.

Use this metric to monitor the network usage of the ESXi.

Network I/O | Aggregate of all instances | Packet Transmitted per second

This metric shows the number of packets transmitted during the collection interval.

Use this metric to monitor the network usage of the ESXi.

Storage Metric Group

Metric

Description

Datastore I/O|Average observed virtual machine disk I/O workload

Storage adapter|Aggregate of all instances|Read latency (ms)

This metric shows the average amount of time required for a read operation by all the storage adapters.

Use this metric to monitor the read operation of the storage adapter. A high value indicates that the ESXi is experiencing storage read operation slowness.

The total latency is the sum of kernel latency and device latency.

Storage adapter|Aggregate of all instances|Write latency (ms)

This metric shows the average amount of time required for a write operation by all the storage adapters.

Use this metric to monitor the write operation performance of the storage adapter. A high value indicates that the ESXi is experiencing storage write operation slowness.

The total latency is the sum of the kernel latency and device latency.