monitor

infx has a simple status monitoring system that uses three status, good, warn and error.

status

infx scans through the instance statistics and sets a status for everything in the instance.

This includes storage spaces, logical logs, user sessions as well as other metrics.

You can set the individual thresholds and what items need to be ignored.

module status

A module status is determined by the status of all the items in it.

For example, the overall checkpoint status is determined by the status assigned to each individual checkpoint.

When any items are “warn”, infx sets that overall status to “warn”.

When any items are “error”, infx sets the overall status to “error”.

Otherwise, the overall status is “good”.

instance status

Each module status is combined to produce an overall status for the instance.

module status description
checkpoint checkpoint status based on duration of checkpoints
chunk chunk status
based on chunk flags
dbspace dbspace status based on free space and storage space flags
DR dri status based on status of connected HDR, RSS, and SDS servers
fs fs status based on free space in file systems
host host status combined instance status and file system status
inst instance status combine user, onlinelog, service, dbspace, chunk, value
onlinelog onlinelog status based on patterns matched in the instance message log
service service status based on latest infx service execution
users users status based on user session flags
value value status based on mode, virtual memory, read ahead, cache, logical logs and checkpoints
Full details about each status and its thresholds: sub-infx-alert.ini

overall host status

The overall status of each instance is then combined with the file system status to produce an overall status for the host.

details

This section describes how each module status is determined, and what thresholds etc can be set.

checkpoint status

The checkpoint status is based on the checkpoint times stored in the instance checkpoint tracing.

Specify warning and error levels based on the total time of the checkpoint.

chunk status

This will be “error” if any chunk file is down.

dbspace status

This status is based on the status and amount of free space in each storage space.

infx sets the class to “error” if the storage space is down, or “warn” if the storage space flags indicate a problem.

Warning and error status can be set bases on the percentage of free space.

Set defaults for all storage spaces with name=”*”.

At line 5 we use warn=”-1″ which means no warning will be generated. An error will be generate at 1%.

At line 6 we are ignoring all warnings and errors for the dbspace named “archive”.

dri status

On a primary server, based on the status of all connected servers. This status will be “good” if all servers are connected and active. It will be error if any are disconnected. It will be warning if the log position of the server falls to far behind.

On a secondary server, based on the status of the connection to the primary server. The status will be “error” if the primary is disconnected.

file system status

The file system status is based on the amount of free space in each file system.

Specify warning and error levels based on the percentage of free space.

host status

You can specify instances to ignore when determining the host status. See the instance status section for how to ignore modules within the instance.

You can specify to ignore the file system status when determining the host status. See the file system status section for how to ignore individual file systems.

instance status

You can specify which modules to ignore when determining the overall instance status. See the individual module sections for how to ignore parts of the module.

message log status

You configure the onlinelog alert status by mapping messages in the online message log to classes.

When you set the attribute case=”no” a case in-sensitive pattern match is used.

When you set the attribute ignore=”yes”, infx will ignore this error or warning when determining the message log status. The class is still used for display purposes.

service status

This status is based on the most recent execution of each service, within the last twenty-four hours.

If any service failed to complete, then the service status will be “warn”.

If any service has reported an error, the service status will be “error”.

You can specify which services to ignore when determining the overall status.

users status

This module produces an overall status of the database sessions.

Specify patterns that match what the session is waiting for:

First step sets any session that is waiting to the “warn” status. Next, known conditions are mapped back to “ok”.

Some items you might consider mapping to “error”.

You can also map sessions classes based on the decoded sesion flags:

You can ignore user sessions based on the user name, host the session is from, or database the session is connected to.

value status

A small number of instance metrics can have values set. Specify warning and error thresholds for each value.

value description
threads_ready_tot the number of threads ready and waiting for cpu
profile_cacheread the percentage of reads from cache
profile_cachewrite the percentage of writes to cache
profile_rautil the percentage of pages read ahead that were utilized
seg_virt_perc the percent of memory free in the virtual segment
ll_remain the percent of the logical logs that remain free for use i.e. backed up and don’t contain a current transaction

Example.

type=”falling” means we are alerting when the value falls below the threshold