You are currently on IBM Systems Media’s archival website. Click here to view our new website.

POWER > Infrastructure > Linux

Maturing Linux Technology Evolves to Handle Big Data Workloads With PowerLinux


Illustration by Viktor Koen

Linux* has grown from a sassy young upstart over the last dozen years—moving beyond its original use as a basic file-serving, Web-serving and open-source platform to handle mission-critical workloads and analytics like big data.

Paired with IBM Power Systems* technology, Linux becomes a real powerhouse for companies. Watson, the IBM supercomputer that bested two “Jeopardy!” champions in 2011, is perhaps the best example of how Linux has morphed into a robust operating system that can handle big data with ease.

Big data—the explosion of data volume and variety combined with data processing velocity and veracity—is not a short-lived trend. It’s a concept being used by all types of industries to find answers to complex problems, and it’s here to stay. Many companies are trying to find the best way to approach the idea of big data. IBM’s Scott Handy, vice president of PowerLinux* Strategy and Business Development, suggests approaching it from a workload point of view.

“When you look at big data, it’s better to look at the workload itself. One of the workload options is based around an open-source project called Apache Hadoop.” Hadoop uses simple programming models that enable the distributed processing of huge data sets across a number of servers. Initially, this technology was developed by and for big Internet data workloads such as Google, Twitter and Facebook that needed to search massive amounts of data in a short amount of time.

Search engines and social media were early adopters, but it didn’t take long for other industries to see this technology could solve a slew of problems confounding their businesses. “The technology presented a new set of possibilities that weren’t evident before,” Handy says.

As a result, big data is being used for new applications that draw from data at rest or from data in motion. IBM InfoSphere* BigInsights* looks at data at rest and can pull data from multiple sources, while InfoSphere Streams looks at data in motion. Clients can use one or the other—or both—to pull data for queries.

Once the data is ready, Hadoop takes a query and divides the data among multiple servers for parallel processing. It could be 10 servers or 100 or 1,000. Hadoop gives each of the servers a piece of data and a query to go against that data using the Hadoop module called MapReduce. Once multiple servers have answered the query, MapReduce brings the answers back into a reduced set. Hadoop has built-in redundancy as it gives two copies of the data to two additional servers, just in case one of the servers goes down. The Hadoop Distributed File System (HDFS) goes across the servers, providing access to the application data. Scripting languages are used to create the query to be run.

Brand sentiment is a typical query companies run on Hadoop. Searching across blogs, Twitter and Facebook, a company can discover public sentiment. The query looks for a brand name, grabs the sentence before and the sentence after, and searches for positive and negative attributes. “The company can then create sales and marketing programs to accentuate the positive and try to diffuse the negative—and that works across all industries,” Handy says.

The usefulness of big data appeals to nearly every industry. “This is new to many people who think it’s just the technology used for Google searches. We’re finding new uses across every industry,” he explains.

Governments are using it to scan the Web and look for potential terrorist activity. Financial services are using it for more refined fraud detection. When you swipe your card, a credit card company can decide in real time to authorize your transaction. Using this technology, the credit card company can look at seven years of your activity instead of the typical 30 days, thereby improving fraud detection. In retail, the technology is being used to match customer searches to merchandise in stock or on sale as well as discover brand sentiment. Healthcare companies use it to match symptoms to medicines, to discover drug interactions and give doctors better suggestions on what the symptoms mean regarding potential diseases or health issues.

“A lot of times you don’t know what you’re looking for when you start the query. You just know there’s a vast amount of information, and if you look for patterns, you might be able to come up with some conclusions about the question you are trying to solve,” Handy says. Companies find they need to hire for a new role called a data scientist, an expert responsible for determining how best to look for patterns in the data.

Shirley S. Savage is a Maine-based freelance writer. Shirley can be reached at savage.shirley@comcast.net.



Advertisement

Advertisement

2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.

POWER > INFRASTRUCTURE > LINUX

Advancing the Ecosystem

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store