Fueling the HPC Transformation with AI
November 12, 2018
Written by David Turek
As the annual Supercomputing conference celebrates its 30th birthday in Dallas this week, I’m reminded how far supercomputing has come, and how exciting the HPC industry is right now. With the Big Data boom, the immense amount of information represents tremendous opportunity for researchers who have new fuel for their projects. But it also provides a new set of challenges, as the boon of information requires new operations to maximize and create insight.
Today, we’re not only exploring new ways to infuse HPC with AI practices to uncover those new insights, but also to create better tools for nearly every stage of the modern HPC workflow to empower researchers with the ability to fathom the seemingly unfathomable, and to bring order to the chaos that the data deluge has created.
The leaps high performance computing has made in computing power don’t always correlate to improved insights, and we’re examining ways for researchers to apply advanced analytics to design better experiments. One such tool is Bayesian methodology, a proven principal of mathematics that analyzes what I know, and suggests what I should do next, thereby helping eliminating simulations that are unlikely to yield desired results from experiment designs.
We’ve worked with customers in pharmaceuticals, chemistry, and materials science and we have observed that the application of Bayesian principles have reduced the number of our simulations by as much as 75% while increasing the accuracy of answers. In an era where Moore’s Law doesn’t have the kick it once had this is a dramatic result, and these techniques could be the path to radically reduced hardware cost and deeper insight by a combination of classic HPC and modern analytics techniques.
Currently, we’re working to encapsulate this capability in an appliance that can be installed adjacent to an existing cluster of any architecture to improve its processing capability[i]. In its current form, the appliance would be preprogrammed so researchers only need to tell the systems to exchange data and the Bayesian appliance would design smarter simulation instructions for the primary cluster. But this is just the first step in making simulation more intelligent, and we’re already seeing strong ecosystem support in building intelligent simulation solutions.
“Penguin and IBM, both members of the OpenPOWER Foundation, have been working together to provide our HPC clients with high value solutions,” said Phil Pokorny, CTO of Penguin Computing. “Intelligent Simulation is something we think holds great potential for enhancing the capability of our solutions and we look forward to working with IBM to assess areas of application.”
“Cray and IBM have collaborated on a number of HPC opportunities over the last few years by combining technologies from both companies to provide our clients the best possible value,” adds Joseph George, Executive Director of Alliances at Cray. “We look forward to working with IBM to evaluate opportunities for the inclusion of Intelligent Simulation tools in collaborative solutions across an array of application domains.”
Cognitive Discovery for HPC
While advanced analytics methods like Bayesian Optimization can design smarter experiments, they still rely on traditional HPC techniques to perform the work. In addition, it is an accepted fact that unstructured data prep and ingestion can take up to 80% of a researcher’s time and Bayesian Optimization doesn’t address that primary problem.
Through collaborations with many customers in oil and gas, materials, manufacturing, and more, we’re investigating new tools to help them with data ingestion at scale. These integrated tools are being designed to better help amass a catalogue of scientific data, and then automatically turn the data into a “knowledge graph”, a visual representation of the data’s relationships. IBM researchers have documented that they have used these unreleased tools to build a knowledge graph of 40 million scientific documents in only 80 hours, a rate of 500,000 documents per hour. The research tools can ingest and interpret data formatted as PDFs, handwritten note books, spreadsheets, pictures, and more. The tool is being designed to help bring order to chaotic data, and contribute to establishing a corporate memory for all the HPC work an organization has ever performed, something of critical importance as employees retire or leave.
It has deep search capabilities built-in that allow exploration of very complicated queries against the knowledge graph along with relevance ratings search results materials for the desired query.
To help these tools be applied across a broad variety of business use cases, they can be used to create vertical, domain specific applications. For example, at the ACS in Boston earlier this year, we showcased just such a tool called IBM RXN which predicts the outcome of organic chemical reactions. This tool is available on the web at no cost to use on the IBM Zurich system. In the context of HPC, this technology presents a unified approach to complement existing simulations with data inspired analytics. And it can, in some cases, even displace classic mod-sim completely.
Begin Exploring Cognitive HPC with IBM Systems
We’re examining ways to bring these capabilities into our existing suite of AI-driven products, including the IBM Power Systems AC922 server and IBM ESS storage, the building block of Summit and Sierra, and our industry leading enterprise deep learning toolkit, PowerAI. Users interested in getting up and running quickly can also explore our new Accelerated Compute Platform offering, which combines compute, storage, and networking with preloaded software in a plug-and-play rack.
This story first appeared on the IBM THINK Blog.
[i] Statements regarding IBM’s future direction and intent are subject to change or withdrawal without notice and represent goals and objectives only.