ASPLOS 2021: Workshops and Tutorials


Monday, April 12 Tuesday, April 13 Wednesday, April 14 Thursday, April 15 Friday, April 16
Morning ESP YArch AIBench ML Performance: Benchmarking Deep Learning Systems NOPE LATTE WORDS SCALE-Sim High Performance Distributed Deep Learning: A Beginner’s Guide Workshop on Systems and Architectures for Robust Software 2.0
Afternoon ILLIXR: Illinois Extended Reality Testbed Dynamic Data-Race Prediction Securing Processor Architectures

ESP: the Open-Source Research Platform for Agile SoC Design and Programming

Time: Monday, April 12 | 8am-11am PT


The ESP open-source platform supports research on the design and programming of heterogeneous SoC architectures.

By combining a scalable modular architecture with a system-level design methodology, ESP simplifies the design of individual accelerators and automates their hardware/software integration into complete SoCs. ESP integrates third-party components, including RISC-V processors and the NVDLA accelerator, offers an automated flow for embedded machine learning accelerators, and enables rapid FPGA-based prototyping of the SoCs. With ESP, researchers in architectures, compilers, and operating systems can evaluate new ideas by running complex user applications on top of the full Linux-based software stack while invoking many different accelerators.

ILLIXR: Illinois Extended Reality Testbed

Time: Monday, April 12 | 11am-3pm PT

Augmented, Virtual, and Mixed Reality, collectively known as Extended Reality (XR), are emerging technologies that will impact most aspects of our lives, ranging from education and science to social interactions and entertainment. However, modern XR systems are unable to fully realize the potential of XR, owing to a severe performance, power, and quality gap that exists between the state-of-the-art systems of today and the idealized XR systems of tomorrow. Closing this gap and designing systems that are several orders of magnitude faster, smaller, and more energy efficient than modern systems is challenging for two main reasons: 1) the diversity and complexity of domains and tasks within XR is immense, and 2) state-of-the-art XR systems are closed source and closely guarded by a few key industry players.

In order to enable architecture and systems research in XR, we developed ILLIXR, the Illinois Extended Reality Testbed, available at ILLIXR is the first fully open-source full system testbed for XR, and contains several state-of-the-art XR components, connected via an efficient, flexible, and modular runtime framework. ILLIXR is OpenXR compliant, enabling it to run any OpenXR-based application, including those developed using game engines. Finally, ILLIXR provides both component- and system-level metrics for both performance and quality, enabling end-to-end experimentation and optimization. ILLIXR enables research in generation of compute and memory primitives, SoC design, inter-accelerator communication, programming languages and OSes for domain-specific systems, real-time scheduling, device-edge-cloud partitioning of workloads, and many other aspects of designing future domain-specific systems.

In this tutorial, we will first provide an overview of the XR domain, then perform a deep dive of ILLIXR and its components, runtime, and metrics, and finally show a hands-on demo of ILLIXR, including setup, build process, and actual usage with real applications.

YArch 2021

Time: Tuesday, April 13 | 7am-3pm PT

The third Young Architect Workshop (YArch ’21, pronounced “why arch”) will provide a forum for junior graduate students and research-active undergrads studying computer architecture and related fields to present early stage or on-going work and receive constructive feedback from experts in the field as well as from their peers. Students will also receive mentoring opportunities in the form of keynote talks, a panel discussion geared toward young architects, and 1-on-1 meetings with established architects. Students will receive feedback from experts both about their research topic in general and more specifically, their research directions. Students will also have an opportunity to receive valuable career advice from leaders in the field and to network with their peers and develop long-lasting, community-wide relationships.

Dynamic Data-Race Prediction : Fundamentals, Theory and Practice

Time: Wednesday, April 14 | 11am-3pm PT

Data races are arguably the most insidious amongst concurrency bugs and extensive research efforts have been dedicated to effectively detect them. Predictive race detection techniques aim to expose data races missed by traditional dynamic race detectors (such as those based on Happens-Before) by inferring data races in alternate executions of the underlying program, without re-executing it. The resulting promise of enhanced coverage in race detection has recently led to the development of many dynamic race prediction techniques.

This tutorial aims to present the foundations of race prediction in a principled manner, consolidate a decade long line of work on dynamic race prediction and discusses recent algorithmic advances that make race prediction efficient and practical. This tutorial also aims to discuss some recent results on the complexity and hardness involved in reasoning about race prediction.

The techniques we will present are useful beyond data race detection and are interesting for people from programming languages, architecture and the broader systems community.

The Tutorial on BenchCouncil AIBench Scenario, Training, Inference, and Micro Benchmarks across Datacenter, HPC, IoT and Edge

Time: Wednesday, April 14 | 7am-11am, 4pm-8pm PT


As a joint work with seventeen industry partners, AIBench is a comprehensive AI benchmark suite, distilling real-world application scenarios into AI Scenario, Training, Inference, and Micro Benchmarks across Datacenter, HPC, IoT, and Edge. AIBench Scenario benchmarks are proxies to industry-scale real-world applications scenarios.  Each scenario benchmark models the critical paths of a real-world application scenario as a permutation of the AI and non-AI modules. Edge AIBench is an instance of the scenario benchmark suites, modeling end-to-end performance across IoT, edge, and Datacenter. AIBench Training and AIBench Inference cover nineteen representative AI tasks with state-of-the-art models to guarantee diversity and representativeness. AIBench Micro provides the intensively-used hotspot functions, profiled from the full AIBench benchmarks, for simulation-based architecture researches. As AI training is prohibitively costly, AIBench Training provides two subsets for repeatable benchmarking and workload characterization to improve affordability; they keep the benchmarks to a minimum while maintaining representativeness. Based on the AIBench Training subset for repeatable benchmarking,  we provide HPC AI500 to evaluate large-scale HPC AI systems. AIoTBench implements the AI inference benchmarks on various IoT and embedded devices, emphasizing diverse light-weight AI frameworks and models. Finally, the hands-on demos illustrate how to use AIBench on the BenchCouncil Testbed, which is publicly available.

ML Performance: Benchmarking Deep Learning Systems

Time: Wednesday, April 14 | 7am-3pm PT


The current landscape of Machine Learning (ML) and Deep Learning (DL) is rife with non-uniform models, frameworks, and system stacks. It lacks standard tools and methodologies to evaluate and profile models or systems. Due to the absence of standard tools, the state of the practice for evaluating and comparing the benefits of proposed AI innovations (be it hardware or software) on end-to-end AI pipelines is both arduous and error-prone — stifling the adoption of the innovations in a rapidly moving field.


Time: Thursday, April 15 | 7am-11am PT


NOPE is a workshop that discusses open, honest port-mortems of research projects which ran into unexpected limitations and resulted in lessons learned. In addition, it will offer a venue to discuss contributions that have been underappreciated over time. The goals of NOPE are to reflect on negative outcomes and offer a venue to uncover opportunities to move forward by reflecting on mistakes made during the research process.

Securing Processor Architectures

Time: Thursday, April 15 | 11am-3pm PT


This tutorial aims to teach the participants about different topics in processor architecture and security, and in particular how to secure modern processor architectures.  The tutorial will focus especially on threats due to information leakage (side and covert channels) and also transient execution attacks.  The tutorial will also touch upon design of secure processor architectures and trusted execution environments (TEEs) and how they are impacted by the information leakage and transient execution attacks.  A number of strategies for defense against the various attacks will be presented in the context of the existing, and hypothesized, threats.  The tutorial will also cover new research opportunities for furthering the security of processor architectures.

LATTE: Languages, Tools, and Techniques for Accelerator Design

Time: Thursday, April 15 | 7am-3pm PT


LATTE is a venue for discussion, debate, and brainstorming at the intersection of hardware acceleration and programming languages research. The focus is on new languages and tools that aim to let domain specialists, not just hardware experts, produce efficient accelerators. A full range of targets are in scope: ASICs (silicon), FPGAs, CGRAs, or future reconfigurable hardware.

WORDS 2021

Time: Thursday, April 15 | 7am-3pm PT


Two recent trends in data centers are trying to move away from computer servers: cloud serverless computing which eschews “servers” by allowing users to directly deploy fine-grained programs that are triggered by external events, and hardware resource disaggregation which breaks a server into fine-grained, network-accessed hardware resource units.  The 2nd Workshop on Resource Disaggregation and Serverless (WORDS 2021) will bring together researchers and practitioners to engage in a lively discussion on a wide range of topics in the broad definition of resource disaggregation and serverless computing.  We solicit both position papers that explore new challenges and design spaces and short papers that include completed or early-stage work. The submission deadline is Feb 22.

SCALE-Sim: Systolic CNN accelerator simulator

Time: Friday, April 16 | 7am-11am PT


SCALE-SIM is a cycle-accurate CNN accelerator simulator, that provides timing, power/energy, memory bandwidth, and memory access trace results for a specified accelerator configuration and neural network architecture. It is based on the systolic array architecture, used in various accelerators like Google’s TPU, Xilinx XDNN, etc. It is developed jointly by ARM Research and Georgia Tech and is open-sourced ( SCALE-SIM enables research into DNN accelerator architectures and is also suitable for system-level studies. Designing an efficient DNN accelerator is a difficult problem that requires searching in an intricate trade-off space with large numbers of architectural parameters. Moreover, recent DNN workloads are increasingly becoming memory-bound due to the increase in model sizes. A simulation infrastructure like SCALE-Sim which can provide cycle-accurate estimates of performance, memory accesses, and other design metrics is, therefore, a vital tool to enable fast and reliable design cycles. Unlike related infrastructure, which relies on analytical models to estimate the performance and operating cost of accelerator designs, SCALE-Sim lets designers capture the behavior of a simulator at each cycle of operation. The tutorial targets students, faculty, and researchers who want to, (a) Get detailed knowledge on how DNN accelerators work, OR (b) Architect and instrument novel DNN accelerators, OR (c) Study performance implications of dataflow mapping strategies and system-level integration, OR (d) Plug a DNN accelerator RTL into their system. The tutorial will be interactive, where the audience will be asked to code up small methods or fill in code snippets to demonstrate the capabilities of the tools and associated APIs.

High-Performance Distributed Learning: A Beginner’s Guide

Time: Friday, April 16 | 7am-11am PT

Recent advancements in Artificial Intelligence (AI) have been fueled by the resurgence of Deep Neural Networks (DNNs) and various Deep Learning (DL) frameworks like TensorFlow and PyTorch. In this tutorial, we will provide an overview of interesting trends in DNN design and how cutting-edge hardware architectures and high-performance interconnects are playing a key role in moving the field forward. We will also present an overview of different DNN architectures and DL frameworks that led to advancements in emerging applications areas like Image Recognition, Speech Processing, and Autonomous Vehicle systems. Most DL frameworks started with a single-node design. However, approaches to parallelize the process of DNN training are also being actively explored. The DL community has moved along with different distributed training designs that exploit communication runtimes like gRPC, MPI, and NCCL. We highlight new challenges and opportunities for communication runtimes to exploit high-performance CPU/GPU architectures and highlight some of our co-design efforts to utilize MPI for large-scale DNN training on cutting-edge CPU and GPU architectures available on modern HPC clusters. Finally, we include hands-on exercises to enable the attendees to gain first-hand experience of running distributed DNN training experiments on a modern GPU cluster.

Workshop on Systems and Architectures for Robust Software 2.0

Time: Friday, April 16 | 9am-3pm PT


Unlike Software 1.0 (conventional programs) that is manually coded with hardened parameters and explicit logics, Software 2.0 programs, usually manifested as and enabled by Deep Neural Networks (DNN), have learnt parameters and implicit logics. While the systems and architecture communities have focused, rightly so, on the efficiency of DNNs, Software 2.0 exposes a unique set of challenges for robustness, safety, and resiliency. The workshop fosters an interactive discussion about computer systems and architecture research’s role of robust, safe, and resilient Software 2.0.