You are currently on IBM Systems Media’s archival website. Click here to view our new website.


Bookmark and Share
RSS

Recent Posts

i Can … Measure Disk Response Times

May 19, 2010

In today's computing environment, disk response time is a critical factor to consider in understanding your system's performance. Processor speed has improved significantly over the past many years, while disk I/O performance has not improved at that same pace. Solid state drives (SSDs) have the potential to change this, but the reality is spinning disks are, and will continue to be, a major component of system performance for some time.

Thus, it’s important to understand the role of disk performance, and one critical measure is whether there are slow disk operations occurring. Too many slow disk operations can have an overall negative impact on systems performance.

In the 6.1 release of IBM i, the capability was added to collect what we call “disk response time groups.”  Disk response times are measured by the low-level I/O disk driver component within the Licensed Internal Code (LIC); that is, the time between sending the disk I/O request and receiving the corresponding response is measured. Response time groups are various time ranges for the I/O operations. For each time range, a count is maintained of the number of I/O times in that range. Since the response times are measured within the LIC, these measured response times apply to all disk operations, whether internal disk or external storage. 

This function was also made available in the V5R4 release with PTFs (the latest in the chain is SI37286). The 6.1 response time groups are fields in the QAPMDISK file.

The 6.1 response time groups are as follows:

Range 1:   0         <  1ms;
Range 2:   1ms    <  16ms;
Range 3:  16ms    <  64ms;
Range 4:  64ms    <  256ms;
Range 5: 256ms   <  1,024ms;
Range 6:               >= 1,024ms;

In the 7.1 release, additional the disk response time groups were added to support more granularity. In particular, smaller response time groups were defined to better account for the faster response times for I/O operations to SSDs; these new response time groups are in microseconds rather than milliseconds. For these new response time groups the counts are now accumulated into separate groups for read and write operations. The combination of the additional granularity and separate groups for read and write times results in a total of 22 response time group fields in the 7.1 release. The 7.1 response time groups are in a new file, QAPMDISKRB. It should be noted that the response time groups that were added in the 6.1 release continue to be supported on 7.1 as well (i.e., 7.1 has two sets of response time groups).

The 7.1 response time groups are as follows:

Range 1:        0                 <         15us;
Range 2:       15us            <        250us;
Range 3:      250us           <      1,000us;
Range 4:    1,000us          <      4,000us;
Range 5:    4,000us          <      8,000us;
Range 6:    8,000us          <     16,000us;
Range 7:   16,000us         <     64,000us;
Range 8:   64,000us         <    256,000us;
Range 9:  256,000us        <    500,000us;
Range 10: 500,000us       <  1,024,000us;
Range 11:                         >= 1,024,000us;

Beginning with the 7.1 release of the Performance Data Investigator, there are now IBM-supplied charts of the disk response time groups. The following screen capture shows the new charts and tables that are available in the Performance Data Investigator for disk response time analysis. You’ll note that there are two sets of charts – one in the “Detailed” folder, and the second set in the Disk Response Time folder. The “Detailed” charts are the charts based upon the more detailed 7.1 response time groups, while the ones in the Disk Response Time folder are use more general response time groups that were introduced in 6.1.

Ican 5.18.10 Fig. 1

Finally, I've included an example screen capture of the 7.1 “Disk I/O Rates Overview - Detailed” chart below, which gives you a histogram of the disk response time groups. You can use the flyover tool to identify the count for the various response time groups.

ICan 5.18.10 Fig 2
If you see long response times, you should perform more detailed analysis to understand and correct those long response times. Disk response times longer than 10 milliseconds should be investigated; disk response times longer than 100 milliseconds are considered bad. For more information on disk performance, refer to section 8.2.3 and Appendix A in the “End to End Performance Management on IBM i."

Posted May 19, 2010| Permalink

-->