ASPLOS 2022: Workshops and Tutorials

Monday, February 28, 2022


Tuesday, March 1, 2022

VEEVirtual (8:30-16:30)
YArchRoom 1A (9:00-18:00)
SysTEXRoom 1B (9:00-18:00)
UnikraftRoom 2A (9:00-18:00)
FireSimRoom 2B (9:00-18:00)
StainlessRoom 2C (9:00-13:00)
Internet ComputerRoom 2C (14:00-18:00)
LATTERoom 5A (9:00-18:00)
vHiveRoom 5B (9:00-18:00)
CiMRoom 5C (9:00-13:00)
STONNERoom 5C (14:00-18:00)
Coffee breaksFoyer Garden 4 and 5
(11:00-11:30, 16:00-16:30)
Lunch breakOn your own (13:00-14:00)


RSS2: Workshop on Robustness and Safe Software 2.0


The workshop is meant to foster an interactive discussion about computer systems and architecture research’s role of robust, safe, and resilient Software 2.0. Ultimately, the workshop is meant to lead to new discussions and insights on algorithms, architectures, and circuit/device-level design as well as system-level integration and co-design.

Negative results, Opportunities, Perspectives, and Experiences (NOPE)


At MICRO 2019, Lynn Conway gave a keynote describing her personal story and the “techno-social” factors which led to her contributions being undervalued and unseen. Inspired by her perspective and call to action, we aim to re-brand the aims and goals of NOPE to be more inclusive of historical perspectives and descriptions of how technical concepts came to be commonplace.

5th Workshop on System Software for Trusted Execution (SysTEX 2022)


The 5th Workshop on System Software for Trusted Execution (SysTEX) will focus on systems research challenges related to TEEs, and explore new ideas and strategies for the implementation of trustworthy systems with TEEs. The goal of the workshop is to foster collaboration and discussion among researchers and practitioners in this field. The rise of new processor hardware extensions that permit fine-grained and flexible trusted execution, such as Intel’s SGX, ARM’s TrustZone, or AMD’s SEV-SNP, introduces numerous novel challenges and opportunities for developers of secure applications. There is a burning need for cross-cutting systems support of such Trusted Execution Environments (TEEs) that spans all the layers of the software stack, from the OS through runtime to compilers and programming models.

Young Architect Workshop (YArch)


The fourth Young Architect Workshop (YArch ’22, pronounced “why arch”) will provide a forum for junior graduate students and research-active undergrads studying computer architecture and related fields to present early stage or on-going work and receive constructive feedback from experts in the field as well as from their peers. Students will also receive mentoring opportunities in the form of keynote talks, a panel discussion geared toward young architects, and 1-on-1 meetings with established architects. Students will receive feedback from experts both about their research topic in general and more specifically, their research directions. Students will also have an opportunity to receive valuable career advice from leaders in the field and to network with their peers and develop long-lasting, community-wide relationships.

Languages, Tools, and Techniques for Accelerator Design


Scope. LATTE is a venue for discussion, debate, and brainstorming at the intersection of hardware acceleration and programming languages research. The focus is on new languages and tools that aim to let domain specialists, not just hardware experts, produce efficient accelerators. A full range of targets are in scope: ASICs (silicon), FPGAs, CGRAs, or future reconfigurable hardware. A wide variety of research topics are in scope including, but not limited to:

  • Domain-specific languages for accelerator design
  • Compilers for optimizing hardware designs
  • Verification, testing, and debugging techniques
  • Virtualization schemes for specialized & reconfigurable hardware

LATTE solicits short position papers that need not fit the mold of a traditional publication:

  • Early, in-progress research snapshots
  • Experience reports on building or deploying accelerators and the tools involved
  • Essays advocating for or against a general approach
  • Retrospectives on past efforts on tools, languages, and techniques for accelerator design
  • Calls for solutions to open challenges in the area (questions without answers)
  • Demonstrations of real systems (to be shown off in a live demo at the workshop)


STONNE: A Simulation Tool for Neural Network Engines


As the complexity of DL accelerators grows, the analytical models currently being used for design-space exploration are unable to capture execution-time subtleties, leading to inexact results in many cases. This opens up a need for cycle-level simulation tools to allow for fast and accurate design-space exploration of DL accelerators, and rapid quantification of the efficacy of architectural enhancements during the early stages of a design. To this end, STONNE is a cycle-level microarchitectural simulation framework that can plug into any high-level DL framework as an accelerator device and perform full-model evaluation (i.e. we are able to simulate real, complete,
unmodified DNN models) of state-of-the-art systolic and flexible DNN accelerators, both with and without sparsity support. STONNE is developed by the University of Murcia and the Georgia Institute of Technology and is open-sourced under the terms of the MIT license.

In this tutorial we demonstrate how STONNE enables research on DNNs accelerators by means of several use cases that range from the microarchitectural networks on-chip present in DNN
accelerators to the scheduling strategies that can be utilized to improve energy efficiency in sparse accelerators. Further, we present OMEGA, another framework built on top of STONNE that enables the exploration of dataflows for accelerators for multi-phase GNN applications which are gaining popularity in the AI and HPC community.

Introduction to CHERI


CHERI (Capability Hardware Enhanced RISC Instructions) is a decade-plus research effort that extends modern Instruction Set Architectures (ISAs) with features for fine-grained memory protection and scalable software compartmentalization. CHERI is a hardware-software codesign effort, and, while it strives to minimize disruption to existing architecture, microarchitecture, and software, it nevertheless has implications for all these areas. This tutorial is intended to familiarize software engineers with CHERI’s architectural facilities and spatially-safe CHERI C/C++ through a mix of hands-on exercises and short presentations and discussions.
The Cambridge Computer Laboratory and SRI have instantiated CHERI in open-source research artifacts based on RISC-V, both software emulation and several FPGA soft-core SoCs. Excitingly, Arm has recently released the Morello prototype board, implementing a CHERI-augmented ARMv8 and giving us a many-core, multi-GHz research platform as well. This tutorial will focus on the CheriBSD/RISC-V environment, as RISC-V has a smaller ISA and slightly easier to explain (dis)assembly, but the adaptation of C to CHERI C is the same in a CheriBSD/Morello environment. Please see our website for prerequisites, including downloading and installing the CheriBSD software release, prior to the tutorial. The pre-packaged distribution requires the use of a MacOS or Linux machine running docker, but we are happy to help get the stack running in other environments, too; please let us know ahead of time.

Unikraft: Specialized OSes for Safety and Performance the Easy Way


With the advance of virtualization technology and the constant demand for specialization, security and performance, unikernels are no longer a fringe idea. In this tutorial we present Unikraft, a unikernel SDK aiming for extreme specialization. Unikraft is an open source project with a growing and vibrant community available on GitHub and Discord. The tutorial will be highly practical. We will provide remote access to pre-configured machines where attendees will build, configure, run and measure Unikraft-based software components. The tutorial is planned to take place for 6 hours (e.g. 10am to 5pm – 1 hour break). Each section in the tutorial will consist of a short presentation / demo (10-15 minutes) followed by practical work to be done by each attendee on their allocated remote machine. Trainers from the Unikraft community will provide instructions and support to attendees during the tutorial.

Verifying Programs with Stainless


Stainless is a system for constructing formally-verified software that is guaranteed to meet specifications for all inputs. In the past it has been used to verify data structures, file system components, and block-chain clients, as well as to formally model program semantics. The primary input format to Stainless is a subset of Scala; an experimental Rust front-end is under development. Stainless programs are compatible with the default Scala compiler and can thus be run on the Java
Virtual Machine. In addition, programs designed to run with pre-allocated memory (e.g., on an embedded system) can be translated to C and processed using conventional C compilers.

To construct verified software in Stainless, developers strengthen the usual well-typed programs in a memory-safe language by providing modular specifications (“contracts”) and proof hints. Such annotations come in the familiar forms (preconditions, postconditions, assertions, as well as invariants), but are simply expressed as pure, boolean expressions of the input language. Stainless formally proves that in all possible program executions the contracts are respected and basic safety properties hold (e.g., that array indices are in-bounds, integer operations do not overflow and null-values are not dereferenced). Moreover, Stainless produces counterexamples when contracts are violated, and can automatically check program termination. This tutorial will provide a hands-on introduction to Stainless through a series of guided examples. We will assume only basic programming skills; no particular background in verification or Scala is required, though a basic understanding of functional programming concepts will be helpful.

Infrastructure for Computing in Memory


Computing in Memory (CiM) is a focus of research in intelligent memories. It is akin to data centric architectures where data is processed where it is produced or stored. The advantage of CiM is to alleviate system bottlenecks caused by moving data from memory to the CPU where it is processed. As less data is moved to CPUs, less resources are needed (networking, CPUs) and less energy is consumed. CiM is a special case of vector processors with very high degree of parallelism (scale of millions of vector elements) and very high degree of programmability (for adding new functions on vectors).

The scope of the tutorial is writing, running, and debugging CiM SW on GSI Gemini, an Associative Processor (APU).

The objective for participants is to get familiar with APU SW abstraction layer as a vector processor and as a bit-serial processor.

The core of the tutorial is CiM SW. Participants shall get hands on experience with low level code (such as element wise Boolean operations in memory and CAM search operations)
and intermediate level code (such as FindMax and arithmetic functions).

The tutorial shall include live coding labs, including a classical exercise for associative memory from the excellent book “Introduction to Parallel Processing, Algorithms and
Architectures” by Behrooz Parhami and the well-known “game of life” cellular automata.

Turbocharging Serverless Research with vHive


This tutorial provides an overview of serverless cloud computing and introduces the vHive ecosystem, a full-stack open-source framework for serverless experimentation and innovation. The tutorial seeks to educate the community about serverless computing architecture and benchmarking methodology, and teach the researchers from the computer architecture and computer systems community to use vHive for their research. The tutorial includes a number of hands-on sessions on writing serverless applications, analyzing their performance in production and open-source serverless clouds as well as instrumenting and optimizing serverless infrastructure across the whole stack.

ASTRA-sim: Enabling SW/HW Co-Design Exploration for Distributed Deep Learning Training Platforms


Modern Deep Learning systems heavily rely on distributed training over customized high-performance accelerator (e.g., TPU, GPU)-based hardware platforms connected via high-performance interconnects (e.g., NVlinks). Examples today include NVIDIA’s DGX-2, Google’s Cloud TPU and Facebook’s Zion. Deep Neural Network (DNN) training involves a complex interplay between the DNN model architecture, parallelization strategy, scheduling strategy, collective communication algorithm, network topology, and the accelerator endpoint. Collective communications (e.g., all-reduce, all-to-all, reduce-scatter, all-gather) are initiated at different phases for different parallelism approaches – and play a crucial role in overall runtime, if not hidden efficiently behind compute. This problem becomes paramount as recent models for NLP such as GPT-3 and Recommendations such as DLRM have billions to trillions of parameters and need to be scaled across tens to hundreds to thousands of accelerator nodes. As innovation in AI/ML models continues to grow at an accelerated rate, there is a need for a comprehensive methodology to understand and navigate this complex design-space to (i) architect future platforms and (ii) develop novel parallelism schemes to support efficient training of future DNN models.

As an ongoing collaboration between Intel, Facebook and Georgia Tech, we have been jointly developing a detailed cycle-accurate distributed training simulator called ASTRA-sim. ASTRA-sim models the co-design space described above and schedules the compute-communication interactions from distributed training over plug-and-play compute and network simulators. It enables a systematic study of bottlenecks at the software and hardware level for scaling training. It also enables end-to-end design-space exploration for running large DNN models over future training platforms. Currently, ASTRA-sim uses SCALE-sim (a Google TPU like simulator) as its compute model and provides a suite of network models (analytical network, Garnet from gem5 and NS3) to go from simple analytical to detailed cycle-accurate simulation of large-scale training platforms. In this tutorial, we will educate the research community about the challenges in the emerging domain of distributed training, demonstrate the capabilities of ASTRA-sim with examples and discuss ongoing development efforts.

Firesim and Chipyard


This tutorial gives a hands-on introduction and walk-through of FireSim and Chipyard, which together enable end-to-end architecture/systems research with RISC-V SoC generators, agile test chips, and FPGA-accelerated simulation. We’ll be providing access to AWS EC2 F1 instances to attendees free-of-charge to interactively follow the tutorial. Attendees will be able to customize an industry and silicon-proven RISC-V microprocessor design, run their own high-performance FPGA-accelerated simulations of their design in the cloud, and learn how to push their design to silicon, guided by the FireSim and Chipyard developers.

Performant and Secure Applications on the Internet Computer


In May 2021 the Internet Computer, a web-speed smart contract and app platform, went live. After less than 6 months, the internet computer hosts more than 12’000 smart contract apps and serves more than 100’000 users, providing a plethora of services, ranging from social media and chat applications to DeFi and NFTs. Thanks to its architecture and support for new service and business models, the internet computer has the potential to change how we develop, host and use apps.

In this tutorial we host sessions on developing apps that offer enhanced security on this decentralized platform, how such smart contracts are executed securely in a sandboxed
environment and how trust can be increased further with reproducible builds.