You are currently on IBM Systems Media’s archival website. Click here to view our new website.


How Simultaneous Multithreading Could Affect System z

A previous installment in this series, “Addressing the Plateauing of CPU Speeds From Another Angle,” alluded to a technology that can help increase total system capacity but actually hurts single-engine speed. That technology—simultaneous multithreading (SMT)—has been available on Intel and IBM Power Systems processors for many years. Now, IBM is sending signals that SMT might be incorporated into a future System z processor.

So, what is SMT? It’s a processor design technique that provides an additional level of parallelism. As the name implies, it enables an instruction processor, called a core by the engineers, to simultaneously execute instructions from multiple instruction streams. Before looking at how that’s accomplished, let’s first consider how the introduction of SMT into Systems z processors will affect mainframe workloads.

Pluses—and Potential Problems

In essence, SMT makes a single core seem to z/OS to be multiple CPUs so it can simultaneously execute multiple z/OS units of work (tasks and SRBs). This can increase the throughput capability of the core without greatly increasing chip area or power consumption. For example, enabling the core to execute two instruction streams simultaneously (called SMT2) nominally increases the throughput of a core by around 40 percent. Of course, the increase for any particular workload depends on a number of factors. There’s no denying this is a good thing, but two potential problems exist with employing SMT.

First, while the combined throughput of the two threads can be 1.4 times the throughput achievable with a single thread, each of the two threads only delivers 70 percent of the speed that the core can run when executing only a single thread. As the aforementioned article discussed, the leveling of complementary metal-oxide semiconductor (CMOS) processor speed must be dealt with as certain workloads grow in the future. These workloads depend upon ever-faster processors to support their growth because they can’t be well parallelized to exploit multiple CPUs. Further reducing the thread speed by exploiting SMT will present even more difficulty. So, assuming the use of SMT will be optional when it’s provided on System z, it shouldn’t be activated until analysis shows there’s no workload that will be unable to meet performance objectives at the reduced CPU speed.

However, the slower execution threads might be a benefit for some workloads because there are more execution threads to run on. Much of the elapsed time of a transaction isn’t spent running on a CPU but waiting to do so. More execution threads means more opportunities to be dispatched—reducing the time spent waiting to run. The reduction in wait time can often compensate for the reduced speed while running, particularly for transactions that don’t do a lot of processing but have to take many trips through the dispatcher. Some installations have taken advantage of this phenomenon by “upgrading” to a sub-capacity processor with more, but slower, engines than the model they’re moving from. A system run in SMT mode looks very much like a system with a slower processor—so it may be advantageous for some workloads.

The second concern with exploiting SMT is variability. Although the nominal speed-up with SMT is said to be 40 percent, that’s merely the midpoint of a wide range of speed-up values. Historically, we’ve had to deal with variability because the actual throughput of a processor varies by workload and, to some extent, on the instantaneous workload mix. This adds to the challenges of capacity planning and providing fair and consistent chargeback. With SMT, the variability rises to a new level.

If an SMT core is constantly busy, but only running one thread, what percent is it utilized? By one view, it’s 100 percent busy because it’s never idle; by another it’s only 50 percent busy because it uses only half the threads—but neither is correct. Utilizing the other thread nominally only adds 40 percent more throughput, so the core is about 70 percent busy with only another 30 percent remaining capacity. The situation is aggravated by the fact that 140 percent is only a ballpark figure and value for the system under consideration could be quite different.

Bob Rogers worked on mainframe system software for 43 years at IBM before retiring as a Distinguished Engineer in 2012.



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

Safely Concealed

IBM Identity Mixer is poised to change how Web users reveal personal data

Ups and Downs

IBM and Stanford University push spintronics to smaller levels

Computing in 3-D

Chips could gain depth to keep delivering on Moore’s Law

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters