9. status monitor

infx has a simple status monitoring system that uses three status: 

 gif  status  description
   good  within limits
   warn  attention required
   error  action required

You set up monitoring by defining thresholds and classes for the items to be monitored.

infx combines the individual item classes to produce an overall status. There are special warning and error classes that infx uses in determining this status.

When items are mapped to the "warn" class, infx sets the overall status to "warn".

When items are mapped to "error" class, infx sets the overall status to "error".

Otherwise, its "good".

The default settings are here: sub-infx-alert.ini

status summary

The overall status of each module is combined to produce an overall status for the instance.

If any module status is 'warn", the instance status will be "warn". Likewise if any are "error". Otherwise, its "good".

The overall status of each instance is then combined with the file system status to produce an overall status for the host.


 module  status  description
 checkpoint  checkpoint status    based on duration of checkpoints
 chunk  chunk status
 based on chunk flags
 dbspace  dbspace_status    based on free space and storage space flags  
 DR  dri_status  based on status of connected HDR, RSS, and SDS servers
 fs  fs_status    based on free space in file systems  
 host  host_status     combined instance status and file system status  
 inst  instance_status    combine user, onlinelog, service, dbspace, chunk, value   
 onlinelog  onlinelog_status    based on patterns matched in the instance message log  
 service  service_status    based on latest infx service execution  
 users  users_status    based on user session flags  
 value  value_status    based on mode, virtual memory, read ahead, cache, logical logs and checkpoints  

checkpoint status

The checkpoint status is based on the checkpoint times stored in the instance checkpoint tracing.

Specify warning and error levels based on the total time of the checkpoint.

[[ checkpoint_alert warn="10" error="15" ]]

chunk status

This will be "error" if any chunk file is down.

dbspace status

This status is based on the status and amount of free space in each storage space.

infx sets the class to "error" if the storage space is down, or "warn" if the storage space flags indicate a problem.

Specify warning and error levels based on the percentage of free space.

[[ dbspace_alert name="*" warn="10" error="5" ]]
[[ dbspace_alert name="scratch" warn="-1" error="1" ]]
[[ dbspace_alert name="archive" ignore="yes" ]]

Set defaults for all storage spaces with name="*"; 

Specify a free space of "-1" to not generate a warning for the "scratch" storage space, error when it reachs 1%.

When you set the attribute ignore="yes", no warning or error will be generated for that storage space. 

dri status

On a primary server, based on the status of all connected servers. This status will be "good" if all servers are connected and active. It will be error if any are disconnected. It will be warning if the log position of the server falls to far behind.

On a secondary server, based on the status of the connection to the primary server. The status will be "error" if the primary is disconnected.

file system status

The file system status is based on the amount of free space in each file system.

Specify warning and error levels based on the percentage of free space.

[[ fs_alert name="*" warn="10" error="5" ]]
[[ fs_alert name="/scratch" ignore="yes" ]]

host status

You can specify instances to ignore when determining the host status. See the instance status section for how to ignore modules within the instance.

[[ host_alert instance="test" ignore="yes" ]]

You can specify to ignore the file system status when determining the host status. See the file system status section for how to ignore individual file systems.

[[ host_alert module="fs" ignore="yes" ]]

instance status

You can specify which modules to ignore when determining the overall instance status. See the individual module sections for how to ignore parts of the module.

[[ instance_alert module="onlinelog" ignore="yes" ]]

message log status

You configure the onlinelog alert status by mapping messages in the online message log to classes. 

[[ onlinelog_alert class="error" pattern="error” case=”no”]]
[[ onlinelog_alert class="warn" pattern="backup is needed” case=”no”]]
[[ onlinelog_alert class="warn" pattern=" aborted” case=”no”]]
[[ onlinelog_alert class="warn" pattern="DR: Turned off”]]
[[ onlinelog_alert class="warn" pattern="DR: Cannot”]]
[[ onlinelog_alert class="warn" pattern="DR secondary:”]]
[[ onlinelog_alert class="warn" pattern="DR: Send error”]]
[[ onlinelog_alert class=“error” pattern=”assert” case=”no”]] 

When you set the attribute case="no" a case in-sensitive pattern match is used.

When you set the attribute ignore="yes", infx will ignore this error or warning when determining the message log status. The class is still used for display purposes.

service status

This status is based on the most recent execution of each service, within the last twenty-four hours.

If any service failed to complete, then the service status will be "warn".

If any service has reported an error, the service status will be "error".

You can specify which services to ignore when determining the overall status.

[[ service_alert service="onstat" ignore=”yes”]]

users status

This module produces an overall status of the database sessions.

Specify patterns that match what the session is waiting for:

[[ users_alert class="warn" waitfor="[a-z0-9]" ]]
[[ users_alert class="ok" waitfor="sm_read" ]]
[[ users_alert class="ok" waitfor="netnorm" ]]
[[ users_alert class="note" waitfor="running" ]]

First step sets any session that is waiting to the "warn" status. Next, known conditions are mapped back to "ok".

Some items you might consider mapping to "error".

[[ users_alert class="error" waitfor="lock" ]]
[[ users_alert class="error" waitfor="logbuff" ]]
[[ users_alert class="error" waitfor="trans" ]]

You can also map sessions classes based on the decoded flags:

[[ users_alert class="ok" flagmean="backup" ]]
[[ users_alert class="ok" flagmean="critical" ]]
[[ users_alert class="xok" flagmean="btree" ]]
[[ users_alert class="ok" flagmean="reading" ]]
[[ users_alert class="ok" flagmean="critical" ]]
[[ users_alert class="ok" flagmean="monitor" ]]
[[ users_alert class="warn" flagmean="wait" ]]
[[ users_alert class="warn" flagmean="recovery" ]]

You can ignore user sessions based on the user name, host the session is from, or database the session is connected to:

[[ users_alert user="informix" ignore="yes" ]]
[[ users_alert host="testapp" ignore="yes" ]]
[[ users_alert dbname="testdb" ignore="yes" ]]

value status

A small number of instance metrics can have values set. Specify warning and error thresholds for each value.

 value  description
 threads_ready_tot   the number of threads ready and waiting for cpu
 profile_cacheread  the percentage of reads from cache
 profile_cachewrite  the percentage of writes to cache
 profile_rautil  the percentage of pages read ahead that were utilized
 seg_virt_perc  the percent of memory free in the virtual segment
 ll_remain  the percent of the logical logs that remain free for use i.e. backed up and don't contain a current transaction

Example.

[[ value_alert value="threads_ready_tot" warn="4" error="8" ]]
[[ value_alert value="ll_remain" type="falling" warn="95" error="60" ]]
[[ value_alert value="seg_virt_perc" warn="95" error="98" ]]
[[ value_alert value="profile_cacheread" type="falling" warn="98" error="95" ]]
[[ value_alert value="profile_cachewrite" type="falling" warn="90" error="95" ]]
[[ value_alert value="profile_rautil" type="falling" warn="98" error="95" ignore="yes" ]]