You use aurora_mon parameters to configure aurora_mon to start, stop, and monitor applications, and to specify how often to monitor an application and what action to take. You specify configuration parameters as key-value pairs.

Use caution when modifying parameters. Do not modify the name and desc parameters, and do not modify the app_priority parameter as it represents start order dependencies between various applications. Take a snapshot backup of the virtual machine before modifying parameters, in case you need to revert.

Parameters you might find it useful to modify include the following.

heartbeat_period

heartbeat_fail_action

heartbeat_ignore_fail_count

app_restart_retry_count

After you modify a parameter, you must stop and restart aurora_mon for your changes to take effect.

Aurora_mon Parameters

Parameter

Description

name (required)

Name of the application. A short representative name that can contain the following characters: a-z, A-Z 0-9, _(underscore),–(dash), and no whitespaces. You use this name to invoke commands on aurora_mon for this application.

descr (required)

A longer but concise description of the application. The description is displayed in the CMS UI.

app_priority (optional, defaults to 0)

A number from 0 to 99 that represents the global start/stop priority of the application in relation to other applications being monitored by aurora_mon. Applications are started and stopped in priority order (0 being the highest priority, 99 being the lowest). An application with a lower priority is started only after all applications with a higher priority have been started. Applications are stopped in the reverse order. All lower priority applications are stopped before an applications with a higher priority is stopped. If a priority is not specified, it defaults to 0 (highest priority).

app_start_cmd (required)

Command you use, such as any program, script, or executable file, to start the application. The start command is successful if the command exits with a zero exit code. If the command does not complete in 300 seconds it is forcibly terminated.

Stdout/stderr can be captured by the aurora_mon daemon if required (through –o and -e options of the aurora_mon daemon), otherwise it is redirected to /dev/null). To run the command as a specified user, you must have an su –c wrapper or have set the setuid bit of the application.

If you do not require a start command, you can use a command that exits with a zero exit code, for example /bin/true. An example of this is where the application is monitoring the amount of disk space on a mount point. There is no application to start.

app_stop_cmd (required)

Command you use, such as any program, script, or executable file, to stop the application. You use this command typically during system shutdown or when restarting applications. The stop command is successful if it exits with a zero exit code. The command must shut down the application cleanly (remove all processes, files, locks, and so on) so that a subsequent start command executes without problems. If the command does not complete in 300 seconds, it is forcibly terminated.

If required, you can use the -o and –e options to have the aurora_mon daemon capture stdout/stderr, otherwise it is redirected to /dev/null. To run the command as a specified user, you must have an su –c wrapper or have set the setuid bit of the application.

If you do not require a stop command, you can use a command that exits with a zero exit code, for example /bin/true. An example of this is where the application is monitoring the amount of disk space on a mount point. There is no application to stop.

heartbeat_check_cmd (required)

Command (any program, script, or executable) to check, the aliveness of the application. The ping is successful (the application is considered alive) if the command exits with zero exit code.

If required, you can use the -o and –e options to have the aurora_mon daemon capture stdout/stderr, otherwise it is redirected to /dev/null. To run the command as a specified user, you must have an su –c wrapper or have set the setuid bit of the application.

heartbeat_period (optional, defaults to 30)

The time in seconds between each heartbeat ping (heart_check_cmd is issued every heartbeat_period seconds). The value can be between 1 second and 600 seconds, and defaults to 30 seconds if not specified. A new heart_beat command is not issued until the previous command finishes.

heartbeat_ignore_fail_count (optional, defaults to 0)

Specifies the number of consecutive heartbeat_check_cmd failures, after which the application is considered to have failed. For example, if heartbeat_ignore_fail_count is 3, the application is considered to have failed after a fourth consecutive heartbeat_check_cmd executes. The first three failures are ignored. This reduces the possibility of a false positive due to intermittent application problems or transient network problems that cause the heartbeat_check_cmd to fail.

app_restart_retry_count (optional, defaults to 3); app_restart_retry_freq (optional, defaults to 10 minutes)

The number of times aurora_mon attempts to restart an application after a failure, and the period of time that elapses before aurora_mon attempts to restart the application. For example, if app_restart_retry_count is 3 and app_restart_retry _freq is 10 minutes, aurora_mon makes three attempts to restart the application and waits 10 minutes before trying again.

heartbeat_fail_action (optional, defaults to RESTART_APP)

The action taken when an application is considered to have failed (after heartbeat_ignore_fail_count consecutive heartbeat_check_cmd failures). The following values are acceptable:

JUST ALERT. Send alert only.

RESTART_APP. Restart the application (attempt app_restart_retry_count times, and wait app_restart_retry_freq time before you try again.

RESTART_VM. Restarts the virtual machine by stopping the virtual machine app monitoring SDK heartbeat to the underlying VMware HA service. The HA virtual machine properties of the cluster determine the virtual machine restart interval and counts.

RESTART_APP_THEN_VM. Attempts to restart the application app_restart_retry_count times. If the command fails to restart the application, it resets the virtual machine using the Guest and HA Application Monitoring SDK.