You're currently on IBMSystemsMags Archive. Click Here to view our new website

AIX > Tips & Techniques > Miscellaneous

Stress or Load Testing: Painless Problem Prediction


I got my first taste of stress testing—often called load testing—in the late 1970s. Stress testing also refers to a cardiac diagnostic procedure and its fundamental meaning holds true either way: Find the point at which the activity level causes an entity to fail. Stress testing was a novel idea when I first worked with it; we used the IBM Field Developed Program Volume Test. It was pretty basic, using the MVS spool rather than emulating a network, but it drove transactions at higher volumes than users or programmers could.

Then IBM announced the fully supported program product Teleprocessing Network Simulator, which drove activity through a 3705 Communications Controller—providing true network emulation and error generation—and could capture real transactions that were converted into volume testing scripts. This allowed reproduction of a full day’s transaction activity (in the millions) which could then be run against a new software release. It was also a lot of work—I once worked over 48 consecutive hours preparing a test—but the results were gratifying both in terms of very clean upgrades and valuable insight into system limits and what parts were the most heavily strained.

Later, I got involved in stress testing with another client for both a website and mainframe. The two systems interacted heavily in order taking. An outside company provided the majority of test preparation and execution, while we prepared and operated the processes between the two processors and resolved problems.

Both scenarios had two objectives that were satisfied and provided great value. They were to:

  1. Identify volume problems and identify necessary upgrades
  2. Resolve problems that arose

What follows are the steps in a stress test.

Establishing and Defining the Environment

The system used for a stress test included the same processor(s), disk and tape drives, software, etc., that were used during a normal workday. The exception to this was new software release(s) that were part of the test. In stress tests I’ve engineered, the entire system(s) were offline; the primary objective was to verify a new release under high volume. Stress tests were always held on holiday weekends, giving us three days rather than two; given the preparation, aborted tests and problem research involved, that extra day made a huge difference.

Establishing and Defining the Workload

A mechanism for generating automated, high volume real-world input is a prerequisite. Asking several thousand users to come in on a day off to replicate a previous day is impossible, especially if the activity is phone-driven. An automated form of recording activity (called a transaction capture) and generating transactions is essential and powerful and is included in most testing products (let’s call it STRESSOR). It usually takes the form of writing input data streams to a file when it arrives in a processor with additional data like timestamps and other logistics.

Once the transaction inputs are captured, they’re converted to scripts that represent screens or windows. STRESSOR contains utilities to do this; when complete the scripts can be further modified to include error or other conditions. Syntax checking is performed and errors are corrected, then scripts are loaded for testing.

Initializing the Test Data

Two types of stress tests exist: inquiry-only and update. Inquiry-only is simpler because no data changes and thus can be run against production data. These tests are only capacity analysis; they don’t thoroughly test new software or most business function. Conversely, update stress tests exercise all software and function, and since production data is sacrosanct, the test is run against a data clone. Full production data backups—usually time consuming—are run just prior to transaction capture. A second production data backup is created just prior to a stress test, followed by a load of the first backups onto disk to create a test configuration. After system restart including STRESSOR and possibly some new software, the stress test is ready.

Determining and Enabling the Monitor(s)

Monitors (i.e., Resource Measurement Facility, OMEGAMON, SYSVIEW, etc.) capture data for reporting and interactively display performance metrics like response time, CPU and channel utilization, paging rates, I/O rates and response times, etc., on a real-time basis. All monitors should be initialized and active before STRESSOR initiates the stress test and begins injecting transactions into the system. Usually several individuals man the workstations, each tracking different metrics, always watching for indications of a system overload. If pushed too hard, the system may stall and require restart, ruining a stress test. The objective is to push a system as hard as possible but not overwhelm it.

Debugging Problems and Test Runs

It shouldn’t be any surprise if something goes wrong during a stress test. In fact, it should be expected. That’s one of the stress test objectives: If something’s going to break, much better to happen during a stress test than the first day of production. Furthermore, fixing and testing the problem can begin immediately, which may mean calls to the vendor, time to develop a fix and ship it. A problem may stop a stress test in its tracks and require a reschedule, but a quick fix may allow the test to continue and finish on time. Because no one knows what may fail, the internal support staff for all software and hardware components should be on call.

Running the Stress Test

Starting STRESSOR involves submitting a job or entering a command. Startup messages are monitored to verify everything is correct. Parameters and commands control the speed of transaction submission; start slow and increase speed as long as things look good. Keep an eye on the system log, stay in touch with members manning the monitors, and keep going until reaching a preplanned level or stress is at a point any more volume will cause issues. If nothing breaks, run to completion; that’s the most thorough test possible.

If a stress test has problems, there may be time to make another try. It will be necessary to shut STRESSOR down, restore data to disk drives like the previous test, perform other necessary re-initialization, and go through all startup steps. It may be advisable to re-IPL or reboot the system to assure a clean start.

When testing is complete, the backup taken prior to the test—containing current production data—is restored, software is backed out if necessary or new versions may be implemented by keeping them live, then the system is reinitialized and opened to users.

Positive Impact

Stress testing takes time and effort—and doesn’t come cheap—but so are software defects or capacity depletions; the latter are much more disruptive than the former. Stress testing often pays big dividends by identifying system or application problems and performance bottlenecks before they occur. It can have a very positive impact on availability, usability and responsiveness.

Jim Schesvold can be reached at jschesvold@mainframehelp.com.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.



Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

AIX > TIPS & TECHNIQUES > MISCELLANEOUS

10 Things to Love About AIX

AIX > TIPS & TECHNIQUES > MISCELLANEOUS

Application Testing: Giving Users What They Need

AIX > TIPS & TECHNIQUES > MISCELLANEOUS

Change Management: Approval Must Be Earned

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters