You are currently on IBM Systems Media’s archival website. Click here to view our new website.

MAINFRAME > Business Strategy > BI and Analytics

The Power of Investigative Analytics With Hadoop and BigInsights

The more that is known about a particular issue, situation, product, organization or individual the greater the likelihood of a better decision and business outcome.

Today the majority of business analytics are based on information stored in enterprise data warehouses fed mainly from transaction and operational systems. This data and its origin is rich in value and is trusted and understood.

Used by 92 of the top 100 global banks, top 10 insurance organizations and 23 of the top 25 U.S. retailers, the z Systems platform holds a significant amount of the world’s business critical information. It is estimated that 80 percent of the world’s corporate data resides or originates on mainframes with 91 percent of CIOs surveyed said that new customer-facing applications are accessing the mainframe. They also said that 55 percent of all enterprise applications need the mainframe to complete transactions and 89 percent said mainframe workloads are increasing and becoming more varied.

While valuable, this data on its own provides just one view of the world. The big data paradigm focuses on combining this data with multiple information sources such as social media, Web logs, email, documents, multimedia, text messages and sensor information to provide a richer, more complete view helping to augment our knowledge of the world.

New Tech on the Block

New technologies such as Hadoop use a map/reduce paradigm enabling parallel processing of massive volumes of differently structured data spread across potentially hundreds or thousands of nodes. This breaks down the analysis of seemingly unmanageable data volumes into small discrete analytics jobs after which the reduced result sets are combined to give the complete answer.

IBM InfoSphere BigInsights elevates Hadoop to an enterprise-ready business-critical analytics solution. IBM InfoSphere BigInsights for Linux on z Systems combined with IBM InfoSphere z Systems Connector for Hadoop gives customers two key advantages, including:

  • Data stored on z Systems can remain on the platform for analysis under the security of z Systems while efficiently and effectively moving DB2, IMS and VSAM related data from z/OS to Hadoop clusters on Linux on z Systems partitions or off-platform clusters
  • Organizations with sensitive information can ensure it remains secure by keeping it on z Systems or moving it there from less secure environments

IBM InfoSphere BigInsights on the mainframe allows clients to have the best of both worlds. They can continue to benefit from the security and reliability of the mainframe for processing critical data, but they can simultaneously take advantage of the rich tooling that exists in Hadoop without compromising the security of operational systems. By being able to combine mainframe data with data from other sources, organizations can obtain a more complete view of their business and often gain insights that can help them improve efficiencies, find new revenue opportunities or reduce costs.

In the sections that follow we discuss how Hadoop is deployed on the mainframe from an architectural standpoint, the software capabilities in InfoSphere BigInsights, and some of the unique considerations that need to be taken into account when running Hadoop on the z Systems mainframe.

Deploying Hadoop

There may be significant advantages to running Hadoop or IBM InfoSphere BigInsights on the mainframe depending on customer requirements, including:

  • Hadoop applications can exist within the z Systems security perimeter
  • Clients can leverage mainframe technologies, including HiperSockets, to securely access production data and move that data to and from Hadoop for processing
  • Clients can realize the management advantages of running Hadoop on a private cloud infrastructure, providing configuration flexibility, virtualized storage and avoiding need to deploy and manage discrete cluster nodes and a separate network infrastructure
  • Clients can extend z Systems governance to hybrid Hadoop implementations

While Linux for z Systems can run natively on mainframe central processors, customers typically deploy Linux environments using IBM’s IFL (see Figure 1). By using the z/VM virtualization technology, one or more IFL processors can be allocated to LPARs. Each z/VM LPAR can run one or more Hadoop nodes where underlying system resources are mapped to the LPAR using the Processor Resource/System Manager. As an alternative, resources from multiple LPARs can be aggregated and used by a single Hadoop cluster node.

Storage resources can be similarly virtualized from a mainframe attached unit such as an IBM System Storage DS8000 series array. Because these subsystems provide all-flash or hybrid flash capabilities, they can typically deliver better I/O performance than local disk approaches normally used on commodity-based clusters.

When deploying Hadoop clusters, normally at least one cluster host is allocated to the function of a master node, while other hosts are referred to as data nodes. In deploying a five-node cluster for example, one node may be allocated to running key services like the BigInsights web console and Hadoop Distributed File System (HDFS) NameNode and JobTracker services. The data nodes in the cluster typically support the distributed HDFS and the various parallel frameworks on Hadoop such as MapReduce, HBASE and Big SQL.

Hadoop Everywhere

It is common to find many Hadoop clusters or installations within a single organization. For example, there might be one for marketing, another for security analysis and one for business analytics. Where to deploy BigInsights often depends on where the data originates and the classification of the data. InfoSphere BigInsights enables the organization to choose the location that best meets its requirements. If the data is highly sensitive, it should be deployed on the mainframe. If not so sensitive, it could deploy on another platform.

One of the biggest challenges with Hadoop is efficiently loading large amounts of data in the Hadoop system. The IBM InfoSphere z Systems Connector for Hadoop addresses this problem. A graphical interface enables the end user to point and click a data source on the mainframe such as databases, log files, etc. and point and click at the BigInsights or other Hadoop distributions installed in the enterprise and the data is loaded directly into the HDFS. No programming, conversion between code pages handled automatically and formatted for the HDFS. Moving data into the cluster can be done manually or pre-scheduled.

Mainframe for Analytics

The z13 mainframe has evolved in to a powerful and efficient hybrid transaction and analytics processing platform offering the highest commercially available security classifications (EAL 5+), 99.999 percent availability, scalability and a commitment to system integrity. The new z13 does everything faster, smarter, more securely and more integrated than any other system.

While the platform’s heritage has been its ability to handle massive transactional volumes, z Systems has also become a platform for business critical analytics, offering the benefits of co-location of data, low latency feeds to data warehouses and data marts, and the same qualities of service extended to a wide range of analytics. The combination of InfoSphere BigInsights for Linux on z Systems combined with InfoSphere z Systems Connector for Hadoop offers a unique production-ready combination of investigative style analytics based on Apache Hadoop.

Organizations can experience the power and flexibility of InfoSphere BigInsights whether on or off mainframe to analyze the volumes of data stored for deeper insights and understanding. InfoSphere z Systems Connector for Hadoop helps to securely and efficiently move data to the BigInsights or other Hadoop implementations allowing organizations to choose where to analyze data based on the security classification of the data and business benefit.

Mark Simmonds is an IT architect and senior product marketing manager z Systems focused on big data, analytics, mobile and information governance for the IBM z Systems portfolio.

Like what you just read? To receive technical tips and articles directly in your inbox twice per month, sign up for the EXTRA e-newsletter here.



2019 Solutions Edition

A Comprehensive Online Buyer's Guide to Solutions, Services and Education.


AI-Driven Technological Progress

All Together Now

A Centralized Business Analytics Environment Delivers Greater Value

IBM Systems Magazine Subscribe Box Read Now Link Subscribe Now Link iPad App Google Play Store
Mainframe News Sign Up Today! Past News Letters