You are currently on IBM Systems Media’s archival website. Click here to view our new website.

MAINFRAME > Storage > Data Management

Preventive Medicine

IBM Health Checker adds features to maintain best practices

IBM Health Checker adds features to maintain best practices
Illustration by Coco Masuda

Catching anomalies before they become big problems is vital to maintaining a healthy system. IBM Health Checker for z/OS*, a foundation that simplifies and automates identifying configuration issues, is designed to do just that. By continuously comparing active values and settings to those suggested by IBM or defined by you, Health Checker alerts you of deviations from best practices before they impact your availability or cause outages. The resulting detailed messages also suggest actions to address exceptions.

Health Checker consists of the framework that manages functions and the health checks that evaluate specific settings and definitions. The architecture of the framework supports the nearly 200 checks written by IBM, along with scores more by ISVs and many specialized ones by customers themselves.

Health Checker and Health Checks

“There’s a distinction between Health Checker and health checks,” says Ulrich (Ulli) Thiemann, advisory software engineer, IBM. “Health Checker itself provides the framework for health checks that look for particular sysplex settings, configuration settings or best practices on a system and report back through this framework. The framework schedules these checks on an automatic basis based on default check attributes, which can also be defined by users. Health Checker grew out of critical situations or outages where people noticed sometimes it’s just a little setting here or there that was inconsistent with the rest of the system but caused big trouble.”

In addition to preventing problems, Health Checker is also used to help with migration, Thiemann notes. “There’s a particular subset of health checks called migration checks. They help customers migrate from release to release by checking certain things that might have changed to see if the customer’s system is ready for this upgrade or if there are any migration actions they still need to figure in,” he says.

Health Checker has officially been a product since z/OS V1R7. It started as a small set of batch jobs before it was made into a download as a tool. Now at V1R13, Health Checker has been part of the base system for several releases, and there’s no charge to acquire or use it.

Activation is straightforward and quick. The user’s guide has a short and easy “quick start” section, and customers can get started right away with a few basic steps that a systems programmer would be very aware of, according to Thiemann. “Optional configurations are there for security, for example, so you can define certain profiles and fine-grain the control over particular checks and make it your own if you want to. But the basic setup is just pretty straightforward,” he adds. He estimates setup would take less than a day.

How It Works

Health Checker is a dynamic, live framework. It transparently keeps a list of known health checks, schedules and runs them, and provides consistent check-message interfaces that route output to the console, SYSLOG and message buffer.

Each individual check has a message buffer associated with it where the system programmer can look for more details—both exceptions and success messages. Information from the message buffer can be accessed in the Health Checker SDSF panel (SDSF CK), where you can view and maintain the health checks. Third-party vendors offer checks and interfaces, or you can write your own using the official Health Checker APIs, which are the basis for both the SDSF CK panel and OMEGAMON* monitor. You can also create alert programs via typical automation products, which key off the standardized Health Checker message IDs, to send exceptions via emails, pager alerts or even tweets.

And so you’re not overrun with alerts notifying you about every little exception, Health Checker rates messages with different severities so systems programmers can view critical alerts immediately, while others can wait until regular business hours.

Health checks are designed to be short-running so they don’t use up CPU or other resources for a long time, and the system will typically spread them out. “Normally, there’s no performance hit with Health Checker. The framework was tested for being quite scalable so the 200 checks we have or even hundreds on a regular system should not have a big impact there at all,” Thiemann says. And if you’re concerned, one of the new functions, SYNCVAL, lets you schedule checks when more system resources are available.

“DOM control allows the check itself to delete the previous exception message when it thinks it’s appropriate.” —Ulrich Thiemann, advisory software engineer, IBM

Tami Deedrick is the former managing editor of IBM Systems Magazine, Power Systems edition.



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Finding the Perfect Fit

IBM System Storage Easy Tier tailors SSDs for your workloads

Encrypt and Protect

IBM Tivoli Key Lifecycle Manager solves security problems and meets new standards

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters