IBM Systems

Deep learning performance breakthrough

January 16, 2018
Written by:

Have you noticed that interest in artificial intelligence (AI) has really taken off in the last year or so? A lot of that interest is fueled by deep learning. Deep learning has revolutionized the way we use our phones, bringing us new applications such as Google Voice and Apple’s Siri, which are based on AI models trained using deep learning.

Learning the way a human learns

Deep learning is a new machine learning method based on neural networks that learns and becomes more accurate as we feed the model more data. A widely-accepted principle of deep learning is shown on the left-hand side of the chart below: deep learning–based AI models have much higher accuracy than traditional machine learning methods, but require much more data to train to achieve that accuracy. On the right, we show the results of a Google Trends search on “deep learning” to demonstrate how people are seeking more information for deep learning over the last several years.

deep learning has revolutionized machine learning

Deep learning enables AI models to learn in a very similar way to how a human learns, which is through experience and perception on an ongoing basis. Just as we teach a baby how to recognize dogs and cats by showing them lots of images and real pets, we can teach a deep learning-based AI model how to recognize images and patterns by feeding the model lots of data.

Similarly, banks today mostly use rule-based systems for fraud detection, wherein a rule might specify a set of conditions that will trigger a fraud alert.  Instead, they can use the past few years of credit card usage to train a deep learning model that learns the more data you give it. In fact, even after you deploy the AI model into production, it can continuously learn from the millions of credit card transactions that the bank is processing every day. The advantage of this approach is that the AI model is automatically learning new situations that might be fraudulent based on experience, rather than a data scientist writing new rules for each type of situation.

Deep learning has dozens of uses across every industry, ranging from retail analytics to drone video analytics to medical imaging analysis to assist clinicians with diagnoses. Businesses can now use these kinds of advanced machine learning methods to extract insights from the data they have been collecting in their data lakes over the last few years. But to use these AI methods, a business needs to process massive amounts of data, and that approach requires IT infrastructure that is up to the task.

Combining the right hardware and software for deep learning

What does meeting the performance demands of deep learning require? First, we don’t think you can do it without accelerators. For example, our IBM Power System AC922 server comes with four NVIDIA Tesla V100 GPU accelerators specifically for this purpose. We partnered with NVIDIA to embed a superhighway interconnect in our processors that connects the server CPU and the GPUs together to handle all the data movement involved in deep learning. Called – NVIDIA NVLink, this superhighway transfers data up to 5.6 times faster than the CUDA host-device bandwidth of tested x86 platforms[1].

A lot of our “magic” behind deep learning also comes from our software framework called PowerAI, which is designed for deployment on IBM Power Systems. PowerAI is an enterprise software distribution of some of the most popular open source machine learning and deep learning frameworks that have been curated, tested and packaged for easy deployment. We tune these open source frameworks for the Power Systems hardware to provide the performance and speed deployment optimized for this deep learning. PowerAI also makes using deep learning easier by helping data scientists to prepare the data and better manage the deep learning process.

Breaking through the IT infrastructure wall

We have found that most of the popular open source deep learning software frameworks scale very well with GPU accelerators within a server, but they scale poorly across several servers in a data center. As a result, the training process for deep learning can take weeks to run. Because training AI models is an iterative process, data scientists waste valuable time waiting for the experiments to run. At IBM, we built an innovative software library called the distributed deep learning (DDL) library, which can take advantage of hundreds of servers.

IBM POWER9 with NVIDIA Tesla V100 delivers a 3.7 times reduction in AI model training compared to tested x86 systems[2]. This is key because AI application training is extremely demanding and requires optimized infrastructures to help organizations of all sizes achieve success in the era of AI.

With these hardware and software optimizations, we have democratized access to modern AI methods such as deep learning to enable a wide variety of organizations to get started with AI. Contact us to get started, learn more about deep learning and IBM PowerAI, and request a free trial.

[1] Results are based on IBM Internal Measurements running the CUDA H2D Bandwidth Test

Hardware: Power AC922; 32 cores (2 x 16c chips), POWER9 with NVLink 2.0; 2.25 GHz, 1024 GB memory, 4xTesla V100 GPU;  Ubuntu 16.04. S822LC for HPC; 20 cores (2 x 10c chips), POWER8 with NVLink; 2.86 GHz, 512 GB memory, Tesla P100 GPU

Competitive HW: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) /  40 threads; Intel Xeon E5-2640 v4;  2.4 GHz; 1024 GB memory, 4xTesla V100 GPU, Ubuntu 16.04

[2] Results are based IBM Internal Measurements running 1000 iterations of Enlarged GoogleNet model  on Enlarged Imagenet Dataset (2560×2560) .

Hardware: Power AC922; 40 cores (2 x 20c chips), POWER9 with NVLink 2.0; 2.25 GHz, 1024 GB memory, 4xTesla V100 GPU Pegas 1.0. Competitive stack: 2x Xeon E5-2640 v4; 20 cores (2 x 10c chips) /  40 threads; Intel Xeon E5-2640 v4;  2.4 GHz; 1024 GB memory, 4xTesla V100 GPU, Ubuntu 16.04.

Software: Chainverv3 /LMS/Out of Core with CUDA 9 / CuDNN7 with patches found at  https://github.com/cupy/cupy/pull/694 and https://github.com/chainer/chainer/pull/3762