You are currently on IBM Systems Media’s archival website. Click here to view our new website.

POWER > Systems Management > Data Management

IBM Researchers Maximize Apache Spark’s Capabilities

IBM Austin Research Laboratory
Illustration by Lonnie Busch

“For example, we have accelerated an end-to-end Spark workload that predicts adverse drug reactions between a pair of drugs using a machine learning model (logistic regression). The application developer invoked the existing code on our GPU-enabled Spark infrastructure with the GPU-accelerated logistic regression library. The GPU-accelerated workload demonstrated 30x gain for the logistic regression model building phase and 4x gain when all phases of the Spark workload were considered.”

The Spark GPU architecture is set up for each node to potentially have multiple GPUs, which the Spark system will use to dispatch the data from the file system and store it in memory.

The usual approach to big data is to use scale-out architecture so that large numbers of nodes of data will fit in the device memory of the CPU. But Spark will fetch the data from the file system, put it in memory and then copy the data back to the GPU to do the numerical calculation. The GPUs can exploit the POWER* platform’s faster CPU-GPU connection, NVLINK, to access data larger than their memories can handle.

When it comes to big data workloads, businesses can work on computations and data and still accelerate the computation. “We believe this is a new area where GPUs can address the problems associated with big data analytics and demonstrate their worth,” Bordawekar explains.

Continued Commitment

Bordawekar and Rellermeyer are among the 3,500 IBM researchers and developers working on Spark-related projects since the company announced its commitment to Apache Spark in June 2015. IBM has built Spark into more than 30 offerings, including IBM BigInsights for Apache Hadoop and IBM Analytics on Apache Spark. As IBM continues to spur open-source innovation, IBM researchers remain committed to making advances in the growing analytics ecosystem.

Juliet Stott is a freelance journalist based in York, England.



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.


Are You Ready for GDPR?


IBM Researchers Maximize Apache Spark’s Capabilities

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store