Xiangyu Li
Pacific Northwest National Laboratory
About: TBD
TBD: TBD
Bronson Messer
Oak Ridge National Laboratory
About: Bronson Messer is a Distinguished Staff Scientist and Director of Science for the Oak Ridge Leadership Computing Facility (OLCF) at ORNL. He is also Joint Faculty Professor in the Department of Physics & Astronomy at the University of Tennessee. Before becoming a staff member at ORNL, he was a postdoctoral research associate in the Department of Astronomy & Astrophysics at the University of Chicago. Dr. Messer is a Fellow of the American Physical Society and he recently served on the APS Committee on Informing the Public (2018-2020). In 2020, he was awarded the Department of Energy Secretary’s Honor Award for his part in enabling the COVID-19 High Performance Computing Consortium. Dr. Messer holds undergraduate and graduate degrees from the University of Tennessee, earning his PhD in physics in 2000.
Co-Design for Science In The Age of Industrial AI: Co-design of large compute systems has been a standard feature of large DOE procurements over the past two decades. Significant innovations that now form the basis of the bulk of data processing in the public and private sectors were aided (or, indeed, almost exclusively due to) such co-design efforts. Now that scientific computing is clearly in the follower role for technology innovations, we have given considerable thought to co-design efforts around the entire scientific computing ecosystems. I will describe some of those considerations and ideas.
Hsien-Hsin Lee
Intel
About: TBD
TBD: TBD
Greg Diamos
TensorWave
About: TBD
TBD: TBD
Vijay Reddi
Harvard
About: TBD
TBD: TBD
Minsik Cho
Apple
About: TBD
TBD: TBD
Charith Mendis
University of Illinois
About: TBD
TBD: TBD
Dong Hyuk Woo
Samsung
About: Dong Hyuk Woo is a Senior Vice President at Samsung Semiconductor, where he leads the AGI Computing Lab. Prior to joining Samsung, Dr. Woo was an early architect of Google’s Tensor Processing Units (TPUs). As program co-founder, architect, and compiler lead of Google’s Edge TPU program, he drove the successful commercialization of machine learning applications on multiple generations of Pixel phones. Dr. Woo holds a B.S. in Electrical Engineering from Seoul National University and M.S. and Ph.D. degrees in Electrical and Computer Engineering from the Georgia Institute of Technology. His work has been recognized with several awards, including the Award of Minister of Information and Communication of Republic of Korea and IEEE Micro’s Top Picks. He has been granted over 30 patents related to machine learning acceleration.
Rethinking Computing: A Memory-Centric Perspective on the Era of Large Model: As large models continue to proliferate across diverse services and use cases, they are reshaping datacenter economics and redefining the computing landscape. In this talk, we will take a memory-centric perspective to explore the far-reaching implications of this transformation. We will review the major developments of the past decade, tracing how system trade-offs have shifted and highlighting the emerging challenges in scaling traditional computing paradigms. To address these challenges, we will discuss memory-centric architectures and software solutions that can unlock scalable AI computing capabilities. Finally, we will outline key research directions that must be pursued to bring memory-centric computing to fruition at scale.
William Chapman
Sandia National Labs
About: William’s research utilizes neurally inspired principles. such as computation through dynamics, local learning, and in-memory computation, to design methods for coordinated computation across temporal and spatial scales. He is currently a researcher in Neuromorphic Computing at Sandia National Laboratories, he is principal investigator of multiple projects which incorporate physics-informed priors, graph connectivity, large-scale neural data analysis, and emerging computational platforms.
Beyond spikes: Intracellular dynamics for mixed signal neuromorphic computation: Neuromorphic research broadly aims to incorporate biologically inspired algorithms and hardware principles to engineered systems, utilizing ideas such as local connectivity, temporal processing, and event-driven connectivity. However, much neuromorphic research has focused on spiking neural networks (SNNs) which utilize a simple leaky-integrate-and-fire model, which acts as a drop-in replacement for activation functions in artificial neural networks. In this talk I will discuss how more complex intracellular dynamics can expand the computational expressivity of individual neurons, trading universality for highly efficient and modular analog computation. Through these simplified models of intracellular dynamics I will discuss two examples of custom CMOS-based hardware which enable efficient edge processing. Finally, I will close with current directions on automated methods, currently being developed in collaboration with Georgia Tech, for incorporating the intrinsic dynamics of emerging devices into large-scale physical neural networks.
Tyler Simon
LPS
About: TBD
TBD: TBD
Steve Reinhardt
AMD
About: Steven K. Reinhardt is a Senior Fellow in the AI Group at AMD. He returned to AMD in 2023 after spending seven years at Microsoft managing a team doing DNN inference on FPGAs for Bing. In his previous stint at AMD, he spent 8½ years at AMD Research, primarily on exascale projects. Prior to that, he was on the faculty at the University of Michigan. Steve is an IEEE Fellow and an ACM Distinguished Scientist. He has a PhD from the University of Wisconsin, an MS from Stanford, and a BS from Case Western Reserve University.
Optimizing for Efficient AI across Hardware, Software, and Models: AI is likely the most impactful technology of our lifetimes. Efforts to fully realize the promise of AI have led to deployments of computing power that are unprecedented in scale. Ultimately the primary constraint on these deployments is power. Increasing the power efficiency of AI compute systems requires a holistic perspective, employing techniques across the hardware, software, and ML model boundaries. This talk showcases examples of these techniques.
Prashant Nair
d-Matrix
About: Prashant J. Nair is a Senior Principal Engineer at d-Matrix Inc., where he is the lead architect of the 3D-memory architecture for their upcoming accelerators. He is also an Associate Professor at the University of British Columbia (UBC), where he leads the Systems and Architectures (STAR) Lab, and an Affiliate Fellow of the Quantum Algorithms Institute. His research focuses on memory architectures and systems for AI/ML workloads. Dr. Nair’s recognitions include the 2024 TCCA Young Architect Award (the highest early-career honor in computer architecture), the 2025 DSN Test of Time Award, the HPCA 2023 Best Paper Award, a MICRO 2024 Best Paper nomination, and the HPCA 2025 Distinguished Artifact Award. Over the past decade, he has published more than 45 papers in top-tier venues. Prior to his promotion to Associate Professor, as an Assistant Professor, he was inducted into all three halls of fame of computer architecture: ISCA, MICRO, and HPCA.
Scaling the Memory Wall: Towards 3D-DRAM-based Accelerators for Efficient Generative Inference: Generative AI now underpins search, digital assistants, and media applications, making inference cost a first-order design constraint. Unlike traditional compute-bound workloads, large language and speech models are typically limited by memory bandwidth and capacity rather than raw arithmetic throughput. Thus, their inference cost is driven as much by data movement as by compute, and therefore hinges on the memory system’s design. This concern is especially acute during autoregressive decoding, which must repeatedly stream model weights and key–value (KV) caches at high bandwidths and low latencies while also providing enough capacity to support long context windows and several concurrent users. To make matters worse, these demands are accelerating with state-of-the-art models now exceeding hundreds of billions of parameters, context windows expanding from 4K to 128K tokens and beyond, and mixture-of-experts designs introducing additional irregularity in memory access patterns. Thus, today’s memory technologies force difficult trade-offs. SRAM can deliver extremely high bandwidth, but at prohibitive area and capacity limits. HBM offers higher capacity, but remains constrained by achievable bandwidth and I/O power. Closing this gap will require a fundamental rethinking of how memory is integrated with accelerator logic.
In this talk, I will introduce our upcoming memory-centric accelerator, which vertically integrates logic with 3D-stacked DRAM to deliver SRAM-level bandwidth and HBM-class capacity while substantially reducing energy consumption. I will describe the architectural challenges addressed by workload-aware channel mapping, optimized power management, topology-preserving redundancy, and thermal-aware reliability mechanisms, enabling the practical deployment of 3D-DRAM. Evaluations using models such as Llama-3.1, DeepSeek-V3, Canary, and Whisper show that our accelerator achieves significantly higher throughput and responsiveness compared to HBM-based alternatives. I will conclude by examining the broader implications for computer architecture, particularly how advanced logic-memory integration through hybrid bonding and multi-high stacking can reshape inference cost structures and enable the next generation of trillion-parameter models.
Prasanth Chatarasi
IBM
About: Prasanth Chatarasi is a Staff Research Scientist at IBM T.J. Watson Research Center where he leads research and development of code generation for IBM’s Spyre accelerator. He focuses on compiler optimizations, dataflow architectures, and hardware–software co-design for next-generation AI systems. Prasanth received his Ph.D. in Computer Science from Georgia Institute of Technology, where he worked on advancing compilation techniques for general-purpose and domain-specific for high-performance systems, and his M.S. Thesis on polyhedral optimizations for explicitly parallel programs from Rice University. He has published in major compiler and architecture venues, holds multiple patents in compiler technology, and actively collaborates with academic partners on various aspects in compilers.
Inside IBM Spyre: Dataflow Architectures and Compiler Innovations for Efficient AI: Achieving high efficiency in modern AI systems hinges on the tight co-design of algorithms, compilers, and hardware accelerators. In this talk, I highlight recent advances at IBM across all three dimensions through the introduction of IBM Spyre AI accelerator. Spyre adopts dataflow architecture and decoupled access-execute principles to deliver high compute throughput (TOPS) and energy efficiency (TOPS/W), leveraging wide datapaths and hierarchical on-chip scratchpad memories to sustain dense compute arrays. This talk also presents some of the innovative compiler techniques for the Spyre accelerator. These techniques demonstrate that co-designing compiler optimizations around dataflow architectural constraints is key to unlocking both performance and programmability in next-generation AI accelerators.
John Shalf
Lawrence Berkeley National Laboratory
About: TBD
TBD: TBD
Humphrey Shi
Georgia Institute of Technology and NVIDIA
About: TBD
TBD: TBD
Willow Ahrens
Georgia Institute of Technology
About: TBD
TBD: TBD
Ada Gavrilovska
Georgia Institute of Technology
About: TBD
TBD: TBD
Abhijit Chatterjee
School of ECE, Georgia Institute of Technology
About: Abhijit Chatterjee is a Professor in the School of ECE at Georgia Tech and a Fellow of the IEEE. He received his Ph.D in electrical and computer engineering from the University of Illinois at Urbana-Champaign in 1990. He has received seven Best Paper Awards. His work on self-healing chips was cited by the Wall Street Journal in 1992. In 1995, he was named a Collaborating Partner in NASA’s New Millennium project. He has received awards for Outstanding Research and Technology Transfer from the Georgia Tech Packaging Research Center. In 2007, his group received the Margarida Jacome Award for work on adaptive RF systems from the Berkeley Gigascale Research Center. Dr. Chatterjee has authored over 480 papers, has 22 patents and supervised over 50 Ph.D dissertations. He co-founded Ardext Technologies Inc., a mixed-signal test solutions company and served as chief scientist from 2000-2002. His research interests include trustworthy machine learning and adaptive real-time mixed-signal systems.
Algorithmic Techniques for Reliable AI on Imperfect Hardware: In this talk, we study the problem of designing error-resilient AI systems where errors can stem from: (a) soft errors in computation of matrix-vector multiplications and neuron activations, (b) malicious trojan and adversarial security attacks and (c) the effects of manufacturing process variations on analog Compute-in-Memory (CiM) that can affect neural model accuracy. The core principle of error detection and correction relies on the use of embedded neuron checks using invariants derived from the statistics of nominal neuron activation patterns as well as algorithmic encoding techniques. Errors are corrected using probabilistic methods due to difficulties involved in exact error diagnosis. The effects of manufacturing process variations on analog CiM modules are handled through the use of compact tests from which neural model performance can be assessed using learning techniques. Analog CiM modules can subsequently be tuned using digital (software driven) tuning techniques to compensate and correct for process variability effects. Experimental results are presented for a variety of AI compute paradigms: CNNs, Reinforcement learning algorithms, Language models, Hyperdimensional computing primitives, Generative adversarial networks and Spiking networks.