CRNCH Lectures – Center for Research into Novel Compute Hierarchies

CRNCH Distinguished Lectures

Rob Schreiber – Distinguished engineer, Cerebras Systems, Inc.

November 6th, 1 PM, Bluejeans (link to follow)

Wafer Scale Computing: What it Means for AI and What it May Mean for HPC

Bio: Rob Schreiber is a Distinguished Engineer at Cerebras Systems, Inc., where he works on ar-chitecture and programming of systems for accelerated training of deep neural networks. Before Cerebras he taught at Stanford and RPI and worked at NASA, at startups, and at HP. Schreiber’s research spans sequential and par-allel algorithms for matrix computation, com-piler optimization for parallel languages, and high performance computer design. With Mol-er and Gilbert, he developed the sparse matrix extension of Matlab. He created the NAS CG parallel benchmark. He was a designer of the High Performance Fortran language. Rob led the development at HP of a system for synthesis of custom hardware accelerators. He has help pioneer the exploitation of photonic signaling in processors and networks. He is an ACM Fellow, a SIAM Fellow, and was awarded, in 2012, the Career Prize from the SIAM Activity Group in Supercomputing.

Abstract: Deep learning needs more performance than what CPUs provide, and the demand is growing faster than Moore’s Law. GPUs and specialized AI-optimized processor accelerators can boost performance over general-purpose CPUs, but do not meet the computational demands of AI, so something else is needed. Wafer-scale computing is proving to be part of the solution. Cerebras Systems has developed and delivered (to US Dept. of Energy labs and GlaxoSmith-Kline) a reliable, manufacturable wafer-scale chip and system, the CS-1, aimed at deep learn-ing training and inference. The largest chip ever made, the Cerebras Wafer-Scale Engine is 60 times larger than the largest CPU and GPU chips. On it there are 400,000 compute cores that provide petaflops of performance, 18 gigabytes of fast SRAM memory with over ten petabytes of bandwidth, and a communication network with 50 petabits of bandwidth. I will present the Cerebras system and discuss the technical problems concerning yield, packaging, cooling, and delivery of electrical power that had to be solved to make it possible, and talk about the programming models possible and in use now for training.