Monitoring the GemFire Distributed System Using GFMon

Use GFMon to monitor GemFire system health, alerts, and other information.

GFMon allows you to manage and monitor a GemFire distributed system in real-time. You can view the following information:
  • Overall health indicators of the distributed system
  • Overall system information
  • Information on various configurational aspects of the distributed system
  • Aggregate and detailed information on various operational aspects of the distributed system
  • System alerts that allow you to take the appropriate corrective action
  • Summary of alerts that have occurred across the distributed system
GFMon also provides the following features:
  • A rich graphical user interface (GUI) with information on memory and CPU usage across the servers in the distributed system
  • Ability to create custom alerts by specifying system statistic thresholds

GFMon uses JMX to retrieve information from the distributed system. GFMon does not use the GemFire Health MBeans.

Left Panel

The left panel helps you to navigate through various sections of the GFMon tool.

Left Panel

  • Click the specific option in the GFMon section to navigate across the various information panels.
  • Use the Quick Connect section to specify the Agent Host and the Agent Port to which the connection must be established. If the agent host name is not compliant with RFC 952 as mandated by RMI, GFMon fails to connect to the agent. After entering the host name, click Connect to connect to the distributed system. After connecting to the distributed system, the label for Connect button changes to Disconnect. To disconnect from the distributed system, click Disconnect.
    Note: When you disconnect from the agent in GFMon or if the JMX agent shuts down, the data in the panels is not cleared. When you connect to any distributed system again, the panels are cleared and populated with the new data.
  • The System Status section displays the state of the distributed system to which you are connected. The following colors are displayed:
    • Green. The system is functioning normally
    • Yellow. User-defined statistic alerts are raised and/or system alerts with severity Warning are raised.
    • Red. System alerts with severity Error or Severe are raised and/or a member has crashed.
    To reset the GFMon System Status indicator, click Reset State. For more information, point the mouse-pointer at any lights displayed in the System Status section.
Tool-tip: When you place the mouse pointer over various GFMon user interface (UI) elements, tool-tips appear and provide to you additional information about the specific UI element.

Summary Panel

The top-most section in the Overview, Data, Members, or Alerts panel displays the host and port of the JMX admin agent to which GFMon is connected or disconnected.The number of servers, gateways, and clients in the distributed system are also displayed in this summary panel.

Summary Panel

Overview Panel

The Overview panel provides the aggregated system health information of all members in the distributed system, the agent that you are connected to, summary view of alerts from all members in the distributed system, and basic member information.

  • Alerts! The Alerts! table displays a summary of alerts triggered by events occurring in the distributed system. The table displays both System alerts and the Statistics alerts defined by the user. To clear an alert in the Alerts! section, right-click and select Clear. Double click on any of the alerts in the Overview Panel-> Alerts! or Member Panel-> Member Alerts Viewer to see more information about the alerts. The following pop-up windows are displayed.
  • System Alert pop-up. When you double-click on a system alert, a pop-up window including the following information is displayed:
    • Member Names. The names of the members for which the system alert has been raised.
    • First Time. The time at which the alert was raised for the first time.
    • Last Time. The time at which the alert was raised for the last time.
    • Severity. Displays the severity of the alert. Severity can be either Warning, Error, or Severe.
    • Detail Message. The detail log message for the alert.

    System alerts include log entries that are tagged as Severe, Error, or Warning. System alerts also consist of alerts indicating unexpected disconnection of a member from the distributed system, unexpected disconnection of the JMX admin agent, and errors occurring in GFMon.
  • Statistics Alerts Pop-up. When you double-click on a statistic alert, a pop-up window including the following information is displayed:
    • Member Name. The name of the member for which the alert has been raised.
    • First Time. The time at which the alert was raised for the first time.
    • Last Time. The time at which the alert was raised for the last time.
    • Count. The number of times this alert has been raised.
    • Statistic Name. The names of different statistics selected for that alert definition.
    • Last Value. The value for the statistic when the alert was raised for the last time.
    • Mean Value. This is the mean value for the statistic; it is calculated by taking the sum of values for each alert divided by the count for the alert.
    • Range. Provides the minimum and maximum value for the statistic in the given time (First time and Last time).

  • Information Alert. When you double-click on an information alert, distinguished by red color, a pop-up window including the following information is displayed:
    • Member name - The name of the member for which this alert has been raised, if applicable.
    • Alert Time - The date and time of the alert.
    • Severity - Level of severity of the alert.
    • Detail Message – The information alert message in detail.

    Alerts are aggregated, that is, the event triggering the alert is shown as a single entry with updates only in the Last Raised Time column. Distinct Statistic alerts defined by the user are aggregated for each member. System alerts are aggregated across the members by the type of event that triggers the alert. The color in the System Status section in the Left Panel changes to yellow or red on occurrence of alerts.

    You may reset this to green to acknowledge that the alerts have been examined. On reset, the aggregation of the existing alert entries in the table is stopped and they are greyed out. New alerts, even if they are caused due to events that triggered earlier alerts, get displayed as new rows in the table. You may still click on the greyed rows to view their details.

    The maximum number of rows in this and other tables in GFMon can be configured from the Preferences Panel.

  • System Memory Usage. This section displays the aggregate memory used across all active servers. The histogram on the right renders the memory used as a percentage of the total available memory across the servers over time. The vertical bar on the left shows a snapshot of the current memory usage. If the System Memory Usage data is not available, then the X-axis is labeled Heap Memory Data is not Available.
  • Resources Utilization Summary. This section provides an animated bar chart of individual memory and CPU usage on each member in the distributed system. The CPU usage bar and memory bar for each node are grouped together.
  • Members Summary. Displays a tabular view of vital member statistics like:
    • ID. The Member ID is a unique auto-generated value that identifies the member in the distributed system.
    • Name. The Name of the member. You can provide this name while starting GemFire through the file or the name is inferred from the member ID.
    • Host. The host on which the member is running.
    • Heap Usage. Displays the used heap memory of the member VM as a percentage of the maximum heap memory.
    • CPU Usage. Displays the percentage of the process CPU utilization.
    • Clients. Number of clients in the cache server, if applicable.
    • Uptime. Total time the node has been up and running.
    Note: The Members Summary section displays the member ID for each member. Other information panels display the member name. If you do not configure a member name, GFMon constructs the member name from the member ID.

    Double-click on a row in the Members Summary section to open the Details for member screen. The Details for member screen provides additional information about each member.

  • Rate of Cache Operations. This graph plots the get and put operations occurring per second over the period of time specified.
  • Regions Configurations. This table lists the attributes of the regions in the distributed cache. The table includes region attributes like name, data policy, disk attributes, and backup persistence. The table also shows entry count and entry size.
  • Gateway Hub. If the member that you have selected is a Gateway, the Gateway Hub section is displayed. This section displays the listening port and the Gateway ID. The Find Points table displays the following:
    • ID. The ID of the Gateway end point
    • Host/Port. The host name and port ID of the Gateway end-point
    • Connected. Whether the gateway is connected

Data Panel

The Data panel provides a bird's eye view of the GemFire data regions using pie charts. The Data panel also provides details like scope, and data policy of each region across the members.

Note: The All Regions table lists Total Memory and % Memory only for regions that have a cap set on memory use. For partitioned regions, this is set using the local-max-memory partitioned attribute. For other regions, this is set by specifying lru-memory-size eviction for the region. When memory use is not limited, these memory statistics are not captured in GemFire and are not displayed in GFMon.
The Region Members Details Viewer provides a tabular view of the following region attributes:
  • Scope. For any region that is not partitioned, the region scope determines whether and how, region data is distributed between the local cache and the rest of the distributed system.
  • Data Policy. The data-policy attribute for each member determines which data is stored in the local cache.
  • Interest Policy. For each member, the interest-policy defines the entry operations that are delivered to the local cached region.
  • Disk Attributes. Disk attributes determine where, and how, region data is overflowed or persisted to a disk. You can define attributes templates inside the <cache> and <region> elements and assign IDs for later retrieval.
  • Cache Loader. A cache loader automatically loads data from an outside source, such as a database. In a distributed region that is not partitioned, one member may host the cache server for the entire distributed region. Loading into a partitioned region requires a cache loader in every partition.

Members Panel

The Members panel displays the characteristics of each member in the distributed system, alerts specific to the selected distributed system member.

  • The bar chart in the Members panel displays the number of clients, CPU usage percentage, and the Queues for each member in the distributed system at current time.
  • Member Summary Viewer. This table lists the member attributes like ID, Name, Total Regions, Root Regions, Clients, Queues, Total memory (in MB), Gets/sec, Puts/sec, Threads, and Network Usage (KB). The member ID is a unique auto-generated value that identifies the member.
  • Member's Clients. This table lists the details of the clients connected to each member. Select a member in the Member Summary Viewer table to view the following attributes for each of the member’s clients:
    • Client ID. A unique auto-generated ID that identifies the client
    • Client Name. A Client short name extracted from the client ID. Displays 'N/A' if the name can't be extracted from the Client ID.
    • Host. The name of the host the client is running on
    • Queue Size. The queue size for this client on the server that this client is connected to
    • Gets. The number of gets/sec that the client is executing on the cache
    • Puts. The number of puts/sec that the client is executing on the cache
    • Cache Misses. The number of times a get operation on the client cache resulted in the data being fetched from a server because it was not already present in the client cache
    • CPU Usage. The CPU usage of the client
    • Threads. The total number of threads in the client application
    • Cache Listener Invocations. The number of times the cache listener has been invoked
    Client statistics are displayed only if statistic sampling and time statistics are enabled in the GemFire distributed system. You can enable these by setting the following properties in the file:
  • Member Alerts Viewer. This table displays the Definition ID and Time for the alerts for the member selected in the Members Summary Viewer. You can configure alerts for a member in the Alerts panel. This section displays the alerts that originated on the member since the last reset. For details, double-click on an alert.

Alerts Panel

You can create custom alerts in the Alerts panel. Custom alerts in GFMon allow you to receive a notification when a specified system statistic of an active member in a distributed system reaches a specified threshold. GemFire JMX admin agent alerts GFMon (and the RMI clients) when the customized alert reaches the threshold. For more information about receiving e-mail alerts, see the GemFire documentation.

For detailed description of system statistics, see the information on Statistics in the GemFire documentation.

Create Custom Alerts

To create custom alerts in GFMon, follow these steps:
  1. In the Create Alert Definitions section, enter a name for the alert.
  2. In the Select Statistics section, click Add.... The Choose Stats window is displayed.
  3. From the drop-down menu, choose the statistic type. The available statistics for the type you have chosen are displayed in the Statistic Name section.
  4. Select the Statistic Name(s) and click the right-arrow to move them to the Statistics Type (Statistic name) section. If you want to deselect any values, select the values in the Statistics Type (Statistic name) section and click the left-arrow button.
  5. Click Select. The statistics that you have selected are displayed in the Select Statistics table. You can add more values to the table by repeating the above steps. To remove a value, select the check-box in the Remove column and click Remove.
  6. If you have selected more than one statistic value, you must apply a function to the statistics to create alerts:
    1. In the Trigger section, select the check-box beside Apply Function. The drop-down list is populated with the available functions.
    2. Select the function that you want to use and specify either the Number (for which the statistic value must be greater than or less than) or the Range (within which the statistic value must fall).
    Note: You can only create a function when you have multiple statistics selected.
  7. Click Save to save the alert definition. The alert name is now displayed in the Available Alert Definitions section.
  8. To modify an alert definition, select the alert from the Available Alert Definitions section and click Edit. You can change the alert definition by editing the desired parameter. To delete an alert definition, select the alert from the Available Alert Definitions section and click Delete.

All alerts are displayed in the Alerts! section of the Overview panel.

The GFMon Event Viewer table displays warnings that are logged in the GFMon’s log file. You can view the internal warnings and node information in the GFMon Event Viewer and see the log file for details. Double-click any item in the table to get more information about the event. The Info Type scroll-bar allows you to scroll through all the events in the table. Use the Message scroll-bar to scroll through the complete message.

By default the GFMon Event Viewer displays the last 100 events only. Go to the General tab in the Preferences panel to configure preferences:
  • To change the number of events viewable in the GFMon Event Viewer, change the Maximum number of rows viewable in a table preference.
  • To change the logging severity/detail level of the GFMon’s log file, change the Logging Level preference.

The following events are logged in the event viewer:

Event Type Purpose
Node Joined When member joins the system.
Node Crashed When a member crashes.
Node Left When a member leaves the system.
Alert! When a system alert occurs.
Internal Error When a GFMon error occurs.
Internal Warning When a GFMon warning occurs.
Reconnecting to the Agent! When GFMon attempts to reconnect to the JMX agent. This happens when the JMX agent shuts down unexpectedly.
Shutdown of the Agent! When an attempt to reconnect to the JMX agent fails.

Use the Clear and Clear All buttons at the bottom-right corner of the GFMon Event Viewer panel to clear the selected event or to clear all events.

Preferences Panel

The Preferences panel allows you to set connection and general preferences and save them for future use. On Windows platforms, these preferences are saved in the system registry in the logged in the HKEY_CURRENT_USER\Software\JavaSoft\Prefs\Gemstone\GemFire Monitor\2.5\<Logged-In-User-Name> node. On Linux platforms, these preferences are saved in the /home/<Logged-In-User-Name>/.java/.userPrefs/GemStone/GemFire Monitor/<encoded_version_info>/<Logged-In-User-Name>/prefs.xml file. The format of the name of the encoded_version_info directory will be similar to _!$)!.g!w.

You can use the Connections tab to create connection preferences and store them for future use.

Create Connection Preferences

To create and use connection preferences:
  1. In the Specify your connection by host name or IP address field, enter system name, host name, and port number.
  2. Click Save. The information that you entered is populated in the Connect to stored sessions section.
    Note: If you connect to a new agent, it is automatically populated in the Connect to stored sessions section. The un-saved connections are denoted by an appended * (asterisk) in the Status column.
  3. Highlight the row of the system name and click Connect or Disconnect as required.
    Note: You can connect to one distributed system at a time

You can use General tab to set various preferences of your GFMon tool.

You can configure the following preferences in the General tab:
  • Log file Directory. The directory where you want the GFMon log files to be stored. You can browse for the directory you want. Make sure the GFMon application has read/write access to the log files.

    The default log file directory is <parent-directory>/GemStone/GemFire/GFMon/logs. The <parent-directory> is either the user's home directory or current working directory depending on the access assessed in the same order.

    The log file name format is yyyy-MM-dd-HH-mm-ss-sss_<random_number>_gfmon.log. For example, 2009-01-19-19-10-36-234_8632816_gfmon.log is located in the log file directory. For each instance of GFMon, a new log file is created with a different name that depends on the timestamp and the random number.

  • Logging Level. The logging level from the available options in the drop-down menu. The default value is info. The valid logging levels are all, finest, finer, fine, config, info, warning, error, severe, and none. The logging level all provides information of all severity levels, while the logging level severe provides the information that is logged at severe level.

    Setting log-level to one of the ordered levels causes all messages of that level and greater severity to be printed. Lowering the log-level reduces system resource consumption while still providing some logging information for failure analysis.

  • Refresh Interval. The number of milliseconds between updates for the data displayed in GFMon views. If GFMon connects to an admin agent running GemFire version 6.0 or later, set the GFMon refresh interval to the same value as the agent’s refresh-interval property. In systems running GemFire 6.6.2 or earlier, the agent's default refresh interval is 5 seconds. In systems running GemFire 6.6.3, the agent's default refresh interval is 15 seconds. For more information about configuring the refresh-interval property in the JMX admin agent, see the topic Starting an Admin Agent with a Non-Default Statistics Refresh Interval.
    Note: The GFMon refresh interval is specified in milliseconds and the JMX Agent refresh interval is specified in seconds.
    Note: The refresh interval configured in the Preferences panel does not affect the refresh interval of the GemFire JMX admin agent. GemFire versions prior to GemFire 6.0 do not support the refresh-interval property for the admin agent.
  • Number of quick connect (host/port) entries to keep in history. The number of entries the Agent host and Agent port fields must retain in the Quick Connect section.
  • X scale time range for Time Charts (in minutes). Define the time range that you want to plot the charts.
  • Reconnect. The number of times and time interval GFMon must attempt to reconnect (in the Number of retries and Reconnect Interval in milliseconds fields) if the connection is lost.
  • Maximum number of rows in a table. The maximum number of rows that are kept in a table. For example, you can configure the number of entries in the Alerts! table in the Overview panel and the GFMon Event Viewer table in the Alerts panel.

After selecting the option or changing the value that you require, click Reset or Apply as applicable. Except for changes to Log File Directory and Refresh Interval, the changes you make reflect immediately. The changes to Refresh Interval reflect when you connect to the agent subsequently. The changes Log File Directory reflect only when you launch GFMon for the next time.

Status Bar

The status bar at the bottom of your GFMon window provides information and alerts about the session.

These are the alert icons and their meanings:
  • This icon indicates a JMX connection
  • These icons indicate an alert
  • This icon indicates a background job
  • This icon is displayed after you connect to the distributed system. Click on it to see the Progress view panel, with information about agent connections and the alerts that you save.

Error Notifications Displayed by GFMon

During the following failure situations, GFMon displays a pop-up window reporting a fatal error:
  • The JMX agent that GFMon is connected to shuts down. A pop-up window titled Fatal Error is displayed with the message "JMX Agent has shut down"
  • GFMon is started and running with a gemfire.jar file which is incompatible with the version of the gemfire.jar file of the running JMX Agent. In this condition, a pop-up window titled Fatal Error is displayed. For example, if you run GFMon with the gemfire.jar file of GemFire version 5.7 to monitor a GemFire version 5.7.1 or GemFire version 5.8 distributed system, the message is:
    GFMon could not connect to the GemFire Distributed 
    System. This version of GFMon is compatible with GemFire 
    Enterprise 5.7. GFmon encountered a Class Mismatch. 
    Please shutdown GFMon and restart with a compatible 
    GemFire version
  • GFMon encounters an error during run-time while retrieving data from the JMX admin agent. A pop-up window titled Fatal Error is displayed with a message depending on the error that has occurred.
To fix the issue, close the pop-up window and perform the following procedure:
  1. Turn off GFMon.
  2. Reset the environment variable GEMFIRE to point to the GemFire product location from which the admin agent is running.
  3. Restart GFMon.