You are currently on IBM Systems Media’s archival website. Click here to view our new website.

AIX > Administrator > Performance

Monitoring Hypervisor Activity


Whenever I do a performance deep dive for a client and am asked, "How can I see what the Hypervisor is doing?" ⎯which is 100 percent of the time⎯my first answer is always, "Use lparstat -H."

Before getting into the benefits of the lparstat command, a few words about the Hypervisor. When we think of performance analysis, we automatically consider a system's physical resources. These resources are grouped into four main categories: CPU, memory, networking and storage. Each of these categories has specific hardware and software associated with it⎯e.g., the networking subsystem has Ethernet adapters and the TCP/IP stack, while storage has disk drives, adapters and the logical volume manager or some other organizing scheme. However, there's one entity that's shared by every component in a System p server: the Power Hypervisor.

The hypervisor is firmware that lives in flash memory on the service processor of a System P box. Conceptually, it sits in between the operating system (which can be AIX, Linux or IBM i) and the hardware. The hypervisor's duties include virtualization functions, the sharing of processors between logical partitions, dynamic LPAR (DLPAR) operations that allow the movement of resources between partitions and the actual initialization and configuration of these systems. Basically, the hypervisor owns all the resources in a system and is responsible for allocating them.

The hypervisor is arguably the most important component in any System P box. Take away the hypervisor, and all your carefully architected LPARs become useless. With this in mind, we realize that it would be very useful to be able to monitor the hypervisor's activities in any environment. There are no dials you can turn or switches you can flip to tune the hypervisor, so take this article as strictly informative. In this two-part series, we'll first look at the very simple form of the lparstat command⎯lparstat -H⎯that I cited in the introduction.

As you likely know, lparstat is great for reporting on CPU and memory. But if you invoke it as root with the -H flag, you'll see a comprehensive list of hypervisor activity, usually in one or two easy to read screens. Try this on one of your systems. The number of calls you see described will depend on your hardware platform, so some of the calls that are displayed from a POWER5 system may not be the same as those from a POWER8.

You can also get a running tally of hypervisor calls by telling lparstat to run with intervals and counts⎯just as you would do with vmstat or iostat⎯like this:

	lparstat  -H  5  10

This is especially useful to run on LPARs in a busy P Series frame. For example, you can look at the amount of time those LPARs are spending in CPU cede calls, adjust your CPU capacity plan to favor the busier partitions, and take away unneeded CPU from partitions that may exhibit light activity. This form of the lparstat command produces output like this:

           Detailed information on Hypervisor Calls

Hypervisor                  Number of    %Total Time   %Hypervisor   Avg Call    Max Call
  Call                        Calls         Spent      Time Spent    Time(ns)    Time(ns)
remove                           0            0.0           0.0          0           0
read                             0            0.0           0.0          0           0
nclear_mod                       0            0.0           0.0          0           0
page_init                       80            0.0           9.4       1067        1531
clear_ref                        0            0.0           0.0          0           0
protect                          0            0.0           0.0          0           0
put_tce                          0            0.0           0.0          0           0
h_put_tce_indirect               0            0.0           0.0          0           0
xirr                             0            0.0           0.0          0           0
eoi                              0            0.0           0.0          0           0
ipi                              0            0.0           0.0          0           0
…lines ommitted….

So where do you find information on these hypervisor calls? While IBM documentation on this topic is thin, the list of common calls is fairly brief. What follows is some of the most widely used calls, along with their definitions. (Note that the name of each call is in the form that's presented by the lparstat -H command. A more precise naming convention is given in part two):

cede: A "cede" occurs when a virtual CPU with no useful work to perform enters a wait state, and gives, or cedes, its CPU capacity to another virtual processor (VP) until such time as useful work appears.

confer: A "confer" allows a VP to give its cycles to one -- or all -- of the other VPs in its LPAR.

Prod: A "prod" makes a specific virtual processor runnable.

enter: An "enter" adds an entry into the page frame table. (A page table is a data structure in most any computer that's used to store the mappings between virtual and physical memory addresses. Virtual addresses are used by the process that calls them; physical addresses are used by the RAM subsystem.)

read: A "read" lists the contents of specific page table entries (PTEs).

remove: A "remove" invalidates entries in the page table.

bulk_remove: This call invalidates up to four entries in the page table.

et_ppp: This call returns the LPAR's performance parameters.

set_ppp: This call allows the LPAR to modify its entitled CPU capacity percentage and variable capacity weight within certain limits.

clear_ref: This call clears reference bits in the specific PTE from the LPAR's page frame table. protect: This call sets protection bits in a specific PTE.

eoi: An "eoi" calls an interrupt reset function.

ipi: This call generates an interprocessor interrupt, which is a type of interrupt where one processor may interrupt another in a multiprocessor system if the processor that does the interrupting requires action from the other processor.

cppr: This call sets a CPU's current interrupt priority.

migrate_dma: This call serializes the sending of logical LAN messages to allow for page migration.

send_logical_lan: This call sends a logical LAN message.

add_logicl_lan_buf: This call adds receive buffers to the logical LAN receive buffer pool (correlate with netstat -v output in a VIO client).

xirr: This call reports on virtual interrupts.

PURR: The Processor Usage Resource Register (PURR) provides a count of the ticks that a shared resource uses per virtual CPU or SMT thread. PURR is invoked in all manner of performance tools.

This list is by no means comprehensive, but it gives you an idea of what hypervisor activity actually is and how it can be measured. If you're wondering about any particular hypervisor function, Google is a good place to start.

In part two of this series, I'll discuss the kernel trace. Properly formatted, this will tell more than you'll ever need to know about what the hypervisor is doing in any given LPAR.

Mark J. Ray has been working with AIX for 23 years, 18 of which have been spent in performance. His mission is to make the diagnosis and remediation of the most difficult and complex performance issues easy to understand and implement. Mark can be reached at mjray@optonline.net



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.



Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Achieving a Resilient Data Center

Implement these techniques to improve data-center resiliency.

AIX > ADMINISTRATOR > PERFORMANCE

AIO: The Fast Path to Great Performance

AIX Enhancements -- Workload Partitioning

The most exciting POWER6 enhancement, live partition mobility, allows one to migrate a running LPAR to another physical box and is designed to move running partitions from one POWER6 processor-based server to another without any application downtime whatsoever.

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
IBMi News Sign Up Today! Past News Letters