Building a Metric Data Warehouse

Hyperic's retention strategy for measurement data is it to store the minimum amount of data that enables it to pinpoint when change in performance or availability occur. Detailed measurement data is stored for a limited period of time - two days, by default - after which the data is compressed and archived as hourly averages with highs and lows. You can configure Hyperic to keep detailed measurement data for longer, up to a maximum of 7 days.

To support requirements for trend analysis over a longer time frame, vFabric Hyperic provides the MetricDataReplicator class, which you can use to replicate uncompressed measurement data in a secondary database.

Metric Replication Strategy Overview

Detailed steps for creating and populating a secondary database for detailed metrics are provided in the sections that follow. This is a summary of the approach:

  • A secondary database instance is configured to store detailed measurement data replicated from the primary Hyperic database. The secondary database contains one table, EAM_MEASUREMENT_DATA. This guide currently only covers MySQL, adjustments will need to be made for Oracle and PostgreSQL.

  • The secondary database has a database link to the primary Hyperic database, and five views that point to the primary Hyperic database for resource inventory data. The resource inventory data does not physically reside on the secondary database. The database link to the main database allows views on the secondary database to access inventory data in the primary Hyperic database. These are the views that are required on the secondary database:

    • EAM_PLATFORM

    • EAM_SERVER

    • EAM_SERVICE

    • EAM_RESOURCE

    • EAM_MEASUREMENT_TEMPL

    • EAM_MEASUREMENT

For documentation on these database tables, see Hyperic Database Table Schemas.

Instructions for Establishing Secondary MySQL Database for Metrics

These sections below have instructions for configuring a secondary Hyperic database for metrics on MySQL.

Note: MySQL's query optimizer has limitations that have negative impact on the the performance of Hyperic's metric replicator class. This performance degradation can have a ripple effect on the performance of your primary Hyperic database during metric replication as such the replicator class should be ran against a slave mysql db for optimum performance. This document is written around the use of the slave DB.

Set Up the Secondary Database

Perform the steps below to create the secondary database and configure access to your primary Hyperic database. All of the steps in this section apply to the secondary database.

Another option would be to use MySQL federated tables

  1. Install your secondary MySQL Database Server and setup replication.

  2. Create user and database for warehouse

    CREATE DATABASE hqdata;
    GRANT ALL ON hqdata.* to hqdata identified by 'password';
    FLUSH PRIVILEGES;
    USE hqdata;
  3. Create table for measurements

    create table EAM_MEASUREMENT_DATA
    (
                TIMESTAMP bigint,
                MEASUREMENT_ID int,
                VALUE numeric(24, 5),
                primary key (TIMESTAMP, MEASUREMENT_ID)
    );
  4. Create view measurements, replace hqdb with your Hyperic Db name

    create view EAM_MEASUREMENT 
     as select ID,
                VERSION_COL,  
                INSTANCE_ID,  
                TEMPLATE_ID,  
                MTIME,  
                ENABLED,  
                COLL_INTERVAL,  
                DSN  
                from hqdb.EAM_MEASUREMENT;
  5. Create view for platforms, replace hqdb with your Hyperc DB name

    create view EAM_PLATFORM
     as select ID,
               VERSION_COL,
               FQDN,
               CERTDN,
               DESCRIPTION,
               CTIME,
               MTIME,
               MODIFIED_BY,
               LOCATION,
               COMMENT_TEXT,
               CPU_COUNT,
               PLATFORM_TYPE_ID,
               CONFIG_RESPONSE_ID,
               AGENT_ID,
               RESOURCE_ID
               from hqdb.EAM_PLATFORM;
  6. Create view for servers, replace hqdb with your Hypierc DB name

    create view EAM_SERVER as
      select ID,
               VERSION_COL,
               DESCRIPTION,
               CTIME,
               MTIME,
               MODIFIED_BY,
               LOCATION,
               PLATFORM_ID,
               AUTOINVENTORYIDENTIFIER,
               RUNTIMEAUTODISCOVERY,
               WASAUTODISCOVERED,
               SERVICESAUTOMANAGED,
               AUTODISCOVERY_ZOMBIE,
               INSTALLPATH,
               SERVER_TYPE_ID,
               CONFIG_RESPONSE_ID,
               RESOURCE_ID
               from hqdb.EAM_SERVER;
  7. Create view for services, replace hqdb with your Hyperic DB name

    create view EAM_SERVICE as
      select ID,
               VERSION_COL,
               DESCRIPTION,
               CTIME,
               MTIME,
               MODIFIED_BY,
               LOCATION,
               AUTODISCOVERY_ZOMBIE,
               SERVICE_RT,
               ENDUSER_RT,
               PARENT_SERVICE_ID,
               SERVER_ID,
               SERVICE_TYPE_ID,
               CONFIG_RESPONSE_ID,
               RESOURCE_ID
               from hqdb.EAM_SERVICE;
  8. Create view for resources, replace hqdb with your Hyperic Db name

    create view EAM_RESOURCE
     as select ID,
               VERSION_COL,
               RESOURCE_TYPE_ID,
               INSTANCE_ID,
               SUBJECT_ID,
               PROTO_ID,
               NAME,
               SORT_NAME,
               FSYSTEM,
               MTIME 
               from hqdb.EAM_RESOURCE;
  9. Create view for measurement templates, replace hqdb with your Hyperic DB name

    create view EAM_MEASUREMENT_TEMPL as
      select ID,
               VERSION_COL,
               NAME,
               ALIAS,
               UNITS,
               COLLECTION_TYPE,
               DEFAULT_ON,
               DEFAULT_INTERVAL,
               DESIGNATE,
               TEMPLATE,
               PLUGIN,
               CTIME,
               MTIME,
               MONITORABLE_TYPE_ID,
               CATEGORY_ID
               from hqdb.EAM_MEASUREMENT_TEMPL;
  10. OPTIONAL: Create view for Availability data, replace hqdb with your Hyperic DB name

    create view HQ_AVAIL_DATA_RLE
     as select MEASUREMENT_ID,
                STARTIME,     
                ENDTIME,      
                AVAILVAL      
                from hqdb.HQ_AVAIL_DATA_RLE;

    Availability isn't replicated as it is stored differently and doesn't have compression. This view is provided as an example if you want to query a single db for availability data.

Test the New Views

To verify the views you created work, you can run a query that lists all servers in the database. Enter the following query at the mysql prompt of the secondary database and to list all of the servers. This query runs against the hqdb database.

SELECT * FROM EAM_SERVER;

Set up the Metric Data Replicator

Once your secondary database is ready to store and view the Hypierc data, you must set up the metric_replicator.properties file with the appropriate parameters. Create a directory where the replicator files will be stored, for instance:
/usr/hyperic/replicator

Property Setting

Description

pri_user=hqadmin

Primary database username

pri_pass=hqadmin

Primary database password

pri_url=jdbc:mysql://<ipaddress>:<port>/hqdb

Connection string for primary server, including IP Address and port

sec_user=hqdata

Secondary database username

sec_pass=password

Secondary database password

sec_url=jdbc:mysql://<ipaddress>:<port>/hqdata?

Connection string for secondary server, including IP Address and port.

interval

Time interval in minutes. Use to specify the interval the script runs at. Default: 90

time_chunk

Amount of minutes to do during this run. Default: 90

batch_size

Number of metrics do to in a batch. Default: 2000

Create log4j Properties File

Create a file called log4j.properties in the replicator directory you created in the previous step and paste this text into the file:

log4j.rootLogger=DEBUG, R
log4j.appender.R=org.apache.log4j.ConsoleAppender
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%d \[PRIVATE:%t\] %-5p %c - %m%n

Create Script to Run the Replication Process

Create a file called run.sh, which will be the script that runs the replication process. Copy the commands shown below, changing the values of JAVA_HOME and SERVER_HOME to point to your Java and Hyperic Server installations respectively.

#!/bin/bash

#update these params
SERVER_HOME="/opt/vmware/vfabric/hyperic/server/active/hq-engine/hq-server"

# props file which configures the replicator
PROPS="metric_replicator"

HQ_ROOT="$SERVER_HOME/webapps/ROOT"

# path to the prop files
# update to reflect appropriate database
PROP_FILES="."
DB_PKGS="$HQ_ROOT/WEB-INF/lib/mysql-connector-java-commercial-5.1.10.jar"
HQ_PKGS="$HQ_ROOT/WEB-INF/lib/hq-common-4.5.jar"
LOG_PKGS="$HQ_ROOT/WEB-INF/lib/commons-logging-1.0.4.jar:$HQ_ROOT/WEB-INF/lib/log4j-1.2.14.jar"
HQEE_PKGS="$HQ_ROOT/WEB-INF/lib/hqee-server-4.5.jar"
HQUTIL_PKGS="$HQ_ROOT/WEB-INF/lib/hq-util-4.5.jar"
PKGS="$PROP_FILES:$DB_PKGS:$HQ_PKGS:$HQEE_PKGS:$LOG_PKGS:$HQUTIL_PKGS"
ARGS="$LOG_ARGS -Dreplprops=$PROPS -Djdbc.drivers=com.mysql.jdbc.Driver -cp $PKGS"
JAVA="$JAVA_HOME/bin/java"

set -x

$JAVA $ARGS com.hyperic.hq.measurement.shared.MetricDataReplicator

Run the Replication Process

To run the replication process, open a terminal window and enter:

run.sh

Verify the Replication Process Results

Run the queries in the following sections to verify that the replication process succeeded.

Query 1 - Show all the disk stats

Run the following query to show all metrics whose name contains the string "disk", replacing the your.platform.nam below with a valid platform name in your resource inventory.

SELECT p.fqdn, 
r.name, 
t.name, 
d.value, 
d.timestamp 
from EAM_MEASUREMENT_TEMPL t, 
EAM_MEASUREMENT m, 
EAM_MEASUREMENT_DATA d,  
EAM_SERVER s, 
EAM_PLATFORM p, 
EAM_RESOURCE r 
where t.id = m.template_id AND 
m.id = d.measurement_id AND 
p.id = s.platform_id AND 
r.instance_id = m.instance_id AND 
lower(p.fqdn) = 'your.platform.name' AND 
lower(r.name) LIKE '%Mount%'  
ORDER BY d.timestamp DESC;

Query 2 - Component Service Usage Information for Servers and Services

Once the appropriate join has been made to access the server or service layer, filtering by server or service name or metric template is the easiest way to select specific metrics of interest per server or service. (Meaning "server" and "service", as defined in the HQ inventory model.) For example, JBoss is a server while the individual web applications running within the container are services. Metrics specific to the JBoss container are available from the server layer while the internal web applications are available via the service layer.

SELECT platform.fqdn, 
resource.name, 
template.name, 
data.value, 
data.timestamp 
FROM EAM_MEASUREMENT_TEMPL template, 
EAM_MEASUREMENT measurement, 
EAM_MEASUREMENT_DATA data, 
EAM_SERVICE service, 
EAM_SERVER server, 
EAM_PLATFORM platform, 
EAM_RESOURCE resource 
WHERE 1=1 AND 
template.id = measurement.template_id AND 
measurement.id = data.measurement_id AND 
platform.id = server.platform_id AND 
server.id = service.server_id AND 
service.id = measurement.instance_id AND 
lower(platform.fqdn) = 'your.platform.name' AND 
lower(resource.name) like '%jboss%' 
AND lower(template.name) like '%transaction count%' 
ORDER BY data.timestamp desc;