You are currently on IBM Systems Media’s archival website. Click here to view our new website.

MAINFRAME > TRENDS > z/VM

SMT on z/VM Increases Efficiency and Capacity


Simultaneous multithreading (SMT) is the capability for a single physical processor core to run multiple streams of instructions at the same time. A stream of instructions is called a thread and both threads on the physical processor core share some of the hardware resources. The IBM z13 supports up to two threads per IFL processor core. z/VM 6.3.0 implemented support in March 2015 for the IBM z13 SMT facility for up to 32 IFL physical processor cores. APARs VM65586 and VM65696 are required for z/VM to exploit the multithreading feature on the z13. See the full list of z/VM APARs.

How SMT Works

In a single-threaded environment, like an IBM zEnterprise zEC12 processor core, the thread has exclusive access to the registers, timer, instruction execution pipelines, address translator, translation lookaside buffer (TLB), Level 1 (L1) cache and Level 2 (L2) cache (see Figure 1). When the core is multi-threaded, like the z13 IFL processor core, instruction execution pipelines, the address translator, TLB, L1 cache and L2 cache are shared by both threads. The program status word, registers, timers and translations remain exclusive to each thread. This sharing of resources causes each of the two threads to run slower than would the one thread in non-SMT mode but together, the two threads increase overall core capacity.

Sometimes the threads collide and have to take turns using the core resources. A cache miss—a delay that occurs when the processor references data or instructions that aren’t already in the data cache or instruction cache—is a good example of the threads not colliding. When one thread of a core is waiting for a cache miss to be resolved, the second thread of the same core can run and make progress. This can increase the overall capacity of the core even though the individual threads on the core run slower. However, since the threads share L1 and L2 caches, they may experience more cache misses, and this will limit the increase in overall capacity. Because the behavior of one of the threads affects the performance of the other thread, the amount of additional capacity provided by SMT can vary greatly.

SMT on z/VM

How does this all look to z/VM and a guest running there? z/VM still provides virtual CPUs to guests (see Figure 2). z/VM still dispatches virtual CPUs on logical processors. When the z/VM in an LPAR is instructed not to exploit SMT, PR/SM provides logical processors for the partition and dispatches one logical processor on a physical core at a time. This is unchanged from historical non-SMT behavior.

When the z/VM LPAR exploits SMT (see Figure 3), PR/SM provides logical processors for the partition and groups them into logical cores. There are two logical processors per logical core. PR/SM then dispatches one logical core on a physical core at a time. Virtual processors of the same guest may be dispatched on the same logical core and gain benefit by the shared L1 and L2 caches. The virtual processors may also be dispatched on different logical cores.

There may be a need to increase the number of virtual CPUs of a Linux guest in an SMT environment. Figure 4 represents the instruction execution rate (IER) of a single threaded IFL core and a multithread IFL core. The numbers are for illustrative purposes and don’t represent a real workload. When a thread is running on a core that has SMT disabled, the IER is 10. When SMT is enabled, the IER for the individual thread is 7 but the two threads yield a capacity of 14 per second, thus increasing the core capacity. A single Linux guest with a virtual CPU defined using 100 percent of its virtual CPU resources is shown in Figure 5. In the SMT disabled case, the virtual CPU can execute 10 operations, but only 7 operations with SMT enabled. Taking that guest and giving it a second virtual CPU allows additional work to be completed (see Figure 6).

Yielding Best Results

The z/VM control program makes an effort to maximize the SMT benefit:

  • The z/VM LPAR is configured to run with HiperDispatch vertical polarization mode by default. Vertical polarization provides tighter core affinity and therefore better cache affinity.
  • There is one dispatch vector per IFL core. Since z/VM tends to place work of the same guest onto one dispatch vector, there is limited cache penalty.
  • z/VM makes an effort to dispatch the virtual CPU on the same thread of a core, as long as the virtual CPU stays in the core’s dispatch vector.
  • The virtual machine minor time slice default value is increased from five milliseconds to 10 milliseconds, giving the dispatched virtual CPU longer access to the built up and reusable data in L1 and L2.

SMT in a z/VM LPAR is not turned on automatically. To enable it:

  1. Apply the appropriate z/VM 6.3.0 service level.
  2. In an IBM z13 mainframe define a z/VM LPAR with IFL cores
  3. Include the following statement in the system configuration file:
    • MULTITHREAD ENABLE
  4. IPL the z/VM LPAR

SMT is transparent to the Linux guest and therefore the guest is unaware it’s running on multi-threaded cores. View a list of tested Linux Platforms for the z13. Live Guest Relocation, the capability to move a running virtual server from one z/VM image to another within the single system image (SSI), is allowed between a non-SMT LPAR and an SMT LPAR as long as they are joined in the same SSI cluster.

How does one know whether SMT is providing benefit? Results will vary and are very much workload dependent. Initial testing completed by the IBM z/VM performance group showed that workloads having highly parallel activity with no single point of serialization yielded the best results. Workloads having a single point of serialization didn’t show a benefit with SMT and workload adjustment should be considered. For example, if a virtual 1-way guest that is using 90 percent of a processor in a non-SMT environment is moved onto a z/VM LPAR with SMT enabled, one would predict needing more than 100 percent of a CPU, otherwise this guest may become the workload bottleneck. Adding a second virtual processor to the Linux guest may eliminate the bottleneck.

Measuring workload throughput and response time is the best way to know whether SMT is providing value to your workload. This entails gathering performance data when SMT is disabled and when SMT is enabled and comparing the results. Workload adjustment should be considered where appropriate. For a detailed performance analysis completed by the IBM z/VM Performance group, see the z/VM SMT Performance Report.

Increasing Efficiency

SMT is a new technology on the IBM mainframe platform. It has the potential to increase the capacity available to z/VM on the IBM z13 processor. z/VM provides support for SMT that is easy to exploit making it an easy choice for workloads that can benefit from the technology. The actual performance benefit varies based on workloads and configurations. Referring to the above referenced z/VM SMT Performance Report is the best place to start.

Xenia Tkatschow is an Advisory Software Engineer at IBM focusing on mainframe virtualization performance.



Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.



Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

MAINFRAME > TRENDS > z/VM

Announcing Further Enhancements for z/VM V5.2

Updates provide new functionality.

Exploiting z/VM

A cross-platform virtualization strategy should include Linux on System z

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters