Program

Remote attendees: Please access the live stream of the program and asynchronous Q&A for individual papers via Whova at this URL.  For the best remote experience we recommend using a laptop or desktop computer. Register for remote attendance here. Here is a guide for remote attendees.

In-person attendees: Please download the Whova app to your mobile device here.  The app gives you access to the conference agenda (with links to floor maps), online networking, asynchronous Q&A for individual papers, and important announcements. Here is a guide for in-person attendees.

Links to lightning talks can be found under each paper in the program below. The conference proceedings will be freely accessible without membership on the ACM Digital Library, not behind a paywall for 1 month starting on the first day of the conference.

ASPLOS 2023 Proceedings
Volume 1: https://doi.org/10.1145/3567955
Volume 2: https://doi.org/10.1145/3575693
Volume 3: https://doi.org/10.1145/3582016

Sunday, 6:00 PM PDT – 9:00 PM PDT: Welcome Reception

Location: Pavillion Ballroom (3rd floor)


Day 1: Monday, March 27

8:00 AM PDT – 8:40 AM PDT: Breakfast

Location: Junior Ballroom & Pavillion Ballroom (3rd floor)

8:40 AM PDT – 9:00 AM PDT: Opening Remarks

Location: Grand AB (GB level, below ground)

9:00 AM PDT – 10:00 AM PDT: Keynote 1 by Azalia Mirhoseini (Anthropic / Stanford Univ.)

Azalia Mirhoseini headshot Abstract
The emergence of powerful generative AI (e.g. large language / vision models) would not have been possible without recent advances in computing systems and accelerators. This talk sheds light on the important role that generative AI itself can play in designing the next generation of computing systems and hardware that in turn would fuel the next generation of AI breakthroughs. Concretely, I will discuss our work on learned optimization for hardware resource allocation and model mapping, which inspired a new and ongoing trend of policy gradient methods for solving combinatorial optimization in computer systems, a generalizable deep reinforcement learning method for chip floorplanning that saved several weeks of design cycle for Google TPUs, and an automated framework for full-stack HW/SW co-design that resulted in drastic Perf/TCO improvements in custom accelerator design. Finally, I will discuss the opportunities and challenges for future computing systems in the era of large generative models.


Bio
Azalia Mirhoseini is a Member of Technical Staff at Anthropic, and an incoming Assistant Professor of Computer Science at Stanford University. Previously, she was a Staff Research Scientist and Team Lead at Google Brain, where she co-founded the Machine Learning for Systems Team. Azalia has published more than 40 peer-reviewed papers at scientific venues such as Nature, ICML, ICLR, NeurIPS, UAI, ASPLOS, SIGMETRICS, DAC, DATE, and ICCAD. She has received a number of awards, including the MIT Technology Review 35 under 35 award, the Best Ph.D. Thesis Award at Rice, and a Gold Medal in the National Math Olympiad in Iran. Her work has been covered in various media outlets, including WIRED, CNBC, ABC News, MIT Technology Review, and IEEE Spectrum.

10:00 AM PDT – 10:20 AM PDT: Coffee Break

Location: Grand Foyer (GB level, below ground)

10:20 AM PDT – 12:00 PM PDT

Session Chair: Tushar Krishna (Georgia Inst. of Technology)
Overlap Communication with Dependent Computation via Decomposition in Large Deep Learning Models
Shibo Wang, Amit Sabne, Andy Davis, Berkin Ilbeyi, Blake Hechtman (Google); Dehao Chen (Waymo); Jinliang Wei, Karthik Srinivasa Murthy, Marcello Maggioni, Qiao Zhang, Sameer Kumar, Tongfei Guo, Yuanzhong Xu, Zongwei Zhou (Google)

Heron: Automatically Constrained High-performance Library Generation for Deep Learning Accelerators
Jun Bi (Univ. of Science and Technology of China); Qi Guo, Xiaqing Li, Yongwei Zhao (Inst. of Computing Tech., Chinese Academy of Sciences); Yuanbo Wen, Yuxuan Guo, Enshuai Zhou (Univ. of Science and Technology of China); Xing Hu, Zidong Du (Inst. of Computing Tech., Chinese Academy of Sciences); Ling Li (Inst. of Software, Chinese Academy of Sciences); Huaping Chen (Univ. of Science and Technology of China); Tianshi Chen (Cambricon Technologies)

TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators
Martin Maas, Ulysse Beaugnon, Arun Chauhan, Berkin Ilbeyi (Google)

EVStore: Storage and Caching Capabilities for Scaling Embedding Tables in Deep Recommendation Systems
Daniar H. Kurniawan (Univ. of Chicago); Ruipu Wang (Beijing Univ. of Technology); Kahfi S. Zulkifli, Fandi A. Wiranata (Bandung Inst. of Technology); John Bent (Seagate Technology); Ymir Vigfusson (Emory Univ.); Haryadi S. Gunawi (Univ. of Chicago)

WACO: Learning Workload-Aware Co-optimization of the Format and Schedule of a Sparse Tensor Program
Jaeyeon Won (Massachusetts Inst. of Technology); Charith Mendis (Univ. of Illinois Urbana-Champaign); Joel Emer (Massachusetts Inst. of Technology / NVIDIA); Saman Amarasinghe (Massachusetts Inst. of Technology)

GRACE: A Scalable Graph-Based Approach To Accelerating Recommendation Model Inference
Haojie Ye (Univ. of Michigan); Sanketh Vedula (Technion); Yuhan Chen, Yichen Yang (Univ. of Michigan); Alex Bronstein (Technion); Ronald Dreslinski, Trevor Mudge, Nishil Talati (Univ. of Michigan)
Session Chair: Nadav Amit (VMware Research Group)
Cohort: Software-Oriented Acceleration for Heterogeneous SoCs
Tianrui Wei (Univ. of California, Berkeley); Nazerke Turtayeva (Univ. of California, Santa Barbara); Marcelo Orenes-Vera (Princeton Univ.); Omkar Lonkar, Jonathan Balkind (Univ. of California, Santa Barbara)

PipeSynth: Automated Synthesis of Microarchitectural Axioms for Memory Consistency
Chase Norman, Adwait Godbole (Univ. of California, Berkeley); Yatin A. Manerkar (Univ. of Michigan)

Probabilistic Concurrency Testing for Weak Memory Programs
Mingyu Gao, Soham Chakraborty, Burcu Kulahcioglu Ozkan (Technische Univ. Delft)

MC Mutants: Evaluating and Improving Testing for Memory Consistency Specifications
Reese Levine, Tianhao Guo, Mingun Cho (Univ. of California, Santa Cruz); Alan Baker, Raph Levien, David Neto (Google); Andrew Quinn, Tyler Sorensen (Univ. of California, Santa Cruz)

AtoMig: Automatically Migrating Millions Lines of Code from TSO to WMM
Martin Beck, Koustubha Bhat, Lazar Striçević (Huawei Dresden Research Center, Huawei Central Software Inst.); Geng Chen (Huawei Fundamental Software Innovation Lab, Huawei Central Software Inst.); Diogo Behrens, Ming Fu (Huawei Dresden Research Center, Huawei Central Software Inst.); Viktor Vafeiadis (Max Planck Inst. for Software Syst.); Haibo Chen (Huawei Central Software Inst., Shanghai Jiao Tong Univ.); Hermann Härtig (Technische Univ. Dresden)

Risotto: A Dynamic Binary Translator for Weak Memory Model Architectures
Redha Gouicem (Technische Univ. München); Dennis Sprokholt (Technische Univ. Delft); Jasper Ruehl (Technische Univ. München); Rodrigo C. O. Rocha (Univ. of Edinburgh); Tom Spink (Univ. of St Andrews); Soham Chakraborty (Technische Univ. Delft); Pramod Bhatotia (Technische Univ. München)
Session Chair: Trevor Carlson (National Univ. of Singapore)
Kodan: Addressing the Computational Bottleneck in Space
Bradley Denby (Carnegie Mellon Univ.); Shadi Noghabi, Krishna Chintalapudi (Microsoft Research); Ranveer Chandra (Microsoft); Brandon Lucia (Carnegie Mellon Univ.)

Space-Efficient TREC for Enabling Deep Learning on Microcontrollers
Jiesong Liu, Feng Zhang, Jiawei Guan (Renmin Univ. of China); Hsin-Hsuan Sung (North Carolina State Univ.); Xiaoguang Guo, Xiaoyong Du (Renmin Univ. of China); Xipeng Shen (North Carolina State Univ.)

STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining
Liwei Guo, Wonkyo Choe, Felix Xiaozhu Lin (Univ. of Virginia)

LEGO: Empowering Chip-level functionality plug-and-play for next-generation IoT devices
Chong Zhang, Songfan Li, Yihang Song, Qianhe Meng, Minghua Chen, YanXu Bai, Li Lu (Univ. of Electronic Science and Technology of China); Hongzi Zhu (Shanghai Jiao Tong Univ.)

Transparent Runtime Change Handling for Android Apps
Zizhan Chen, Zili Shao (The Chinese Univ. of Hong Kong)

12:00 PM PDT – 1:00 PM PDT: Lunch

Location: Junior Ballroom & Pavillion Ballroom (3rd floor)

1:00 PM PDT – 2:40 PM PDT

Session Chair: Xipeng Shen (North Carolina State Univ.)
SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive Development
Zujun Tan, Yebin Chon (Princeton Univ.); Michael Kruse, Johannes Doerfert (Argonne National Laboratory); Ziyang Xu (Princeton Univ.); Brian Homerding, Simone Campanoni (Northwestern Univ.); David I. August (Princeton Univ.)

Beyond Static Parallel Loops: Supporting Dynamic Task Parallelism on Manycore Architectures with Software-Managed Scratchpad Memories
Lin Cheng (Cornell Univ.); Max Ruttenberg, Dai Cheol Jung (Univ. of Washington); Dustin Richmond (Univ. of California, Santa Cruz); Michael Taylor, Mark Oskin (Univ. of Washington); Christopher Batten (Cornell Univ.)

Graphene: An IR for Optimized Tensor Computations on GPUs
Bastian Hagedorn, Bin Fan, Hanfeng Chen, Cris Cecka, Michael Garland, Vinod Grover (NVIDIA)

Coyote: A Compiler for Vectorizing Encrypted Arithmetic Circuits
Raghav Malik, Kabir Sheth, Milind Kulkarni (Purdue Univ.)

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers
Jiawei Liu (Univ. of Illinois Urbana-Champaign); Jinkun Lin, Fabian Ruffy (New York Univ.); Cheng Tan (Northeastern Univ.); Jinyang Li, Aurojit Panda (New York Univ.); Lingming Zhang (Univ. of Illinois Urbana-Champaign)

TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization
Anand Jayarajan (Univ. of Toronto / Vector Inst.); Wei Zhao, Yudi Sun (Univ. of Toronto); Gennady Pekhimenko (Univ. of Toronto / Vector Inst.)
Session Chair: Mark Silberstein (Technion)
NUBA: Non-Uniform Bandwidth GPUs
Xia Zhao (Artificial Intelligence Research Center); Magnus Jahre (Norwegian Univ. of Science and Technology); Yuhua Tang (National Univ. of Defense Technology); Guangda Zhang (Artificial Intelligence Research Center); Lieven Eeckhout (Ghent Univ.)

Scoped Buffered Persistency Model for GPUs
Shweta Pandey (Indian Inst. of Science); Aditya K Kamath (Univ. of Washington); Arkaprava Basu (Indian Inst. of Science)

Skybox: Open-source Graphic Rendering on Programmable RISC-V GPUs
Blaise Tine, Varun Saxena (Georgia Inst. of Technology); Santosh Raghav Srivatsan (Georgia Inst. of Technology / North Carolina State Univ.); Joshua R. Simpson, Fadi Alzammar (California Polytechnic State Univ.); Liam Cooper, Hyesoon Kim (Georgia Inst. of Technology)

DefT: Boosting Scalability of Deformable Convolution Operations on GPUs
Edward Hanson, Mark Horton, Hai "Helen" Li, Yiran Chen (Duke Univ.)

MSCCLang: Microsoft Collective Communication Language
Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong (Microsoft Research)

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture
Zaid Qureshi, Vikram Sharma Mailthody (Univ. of Illinois Urbana-Champaign); Isaac Gelado (NVIDIA); Seungwon Min, Amna Masood, Jeongmin Park (Univ. of Illinois Urbana-Champaign); Jinjun Xiong (Univ. at Buffalo); CJ Newburn, Dmitri Vainbrand (NVIDIA); IHsin Chung (IBM Research); Michael Garland (NVIDIA); William Dally (NVIDIA / Stanford Univ.); Wen-mei Hwu (Univ. of Illinois Urbana-Champaign)
Session Chair: Mohammad Shahrad (Univ. of British Columbia)
Snape: Reliable and Low-Cost Computing with Mixture of Spot and On-demand VMs
Fangkai Yang, Lu Wang, Zhenyu Xu, Jue Zhang, Liqun Li, Bo Qiao (Microsoft Research); Camille Couturier, Chetan Bansal (Microsoft 365); Soumya Ram (Microsoft Azure); Si Qin (Microsoft Research); Zhen Ma (Microsoft 365); Íñigo Goiri, Eli Cortez (Microsoft Azure); Terry Yang, Victor Rühle, Saravan Rajmohan (Microsoft 365); Qingwei Lin, Dongmei Zhang (Microsoft Research)

Erms: Efficient Resource Management for Shared Microservices with SLA Guarantees
Shutian Luo (Shenzhen Inst. of Advanced Tech., Chinese Academy of Sciences / Univ. of Macau); Huanle Xu (Univ. of Macau); Kejiang Ye (Shenzhen Inst. of Advanced Tech., Chinese Academy of Sciences); Guoyao Xu, Liping Zhang, Jian He, Guodong Yang (Alibaba Group); Chengzhong Xu (Univ. of Macau, Macau SAR, China)

Ditto: End-to-End Application Cloning for Networked Cloud Services
Mingyu Liang, Yu Gan, Yueying Li (Cornell Univ.); Carlos Torres, Abhishek Dhanotia (Meta); Mahesh Ketkar (Intel); Christina Delimitrou (Massachusetts Inst. of Technology)

BeeHive: Sub-second Elasticity for Web Services with Semi-FaaS Execution
Ziming Zhao, Mingyu Wu, Jiawei Tang, Binyu Zang, Zhaoguo Wang, Haibo Chen (Shanghai Jiao Tong Univ.)

AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-stage Serverless Workflows
Zhuangzhuang Zhou, Yanqi Zhang, Christina Delimitrou (Cornell Univ.)

A Generic Service to Provide In-network Aggregation for Key-value Streams
Yongchao He (Tsinghua Univ.); Wenfei Wu (Peking Univ.); Yanfang Le (Intel, Barefoot Switch Division); Ming Liu (Univ. of Wisconsin-Madison); ChonLam Lao (Harvard Univ.)

2:40 PM PDT – 3:00 PM PDT: Coffee Break

Location: Grand Foyer (GB level, below ground)

3:00 PM PDT – 3:50 PM PDT

Session Chair: Lieven Eeckhout (Ghent Univ.)
Junkyard Computing: Repurposing Discarded Smartphones to Minimize Carbon
Jennifer Switzer, Gabriel Marcano, Ryan Kastner, Pat Pannuto (Univ. of California, San Diego)

Carbon Explorer: A Holistic Framework for Designing Carbon Aware Datacenters
Bilge Acun (Meta); Benjamin Lee (Univ. of Pennsylvania / Meta); Fiodar Kazhamiaka (Stanford Univ.); Kiwan Maeng (Meta); Udit Gupta (Harvard Univ.); Manoj Chakkaravarthy (Meta); David Brooks (Harvard Univ.); Carole-Jean Wu (Meta)

Ecovisor: A Virtual Energy System for Carbon-Efficient Applications
Abel Souza, Noman Bashir, Jorge Murillo, Walid Hanafy, Qianlin Liang, David Irwin, Prashant Shenoy (Univ. of Massachusetts Amherst)
Session Chair: Adrian Sampson (Cornell Univ.)
Mapping Very Large Scale Spiking Neuron Network to Neuromorphic Hardware
Ouwen Jin, Qinghui Xing, Ying Li, Shuiguang Deng, Shuibing He, Gang Pan (Zhejiang Univ.)

HuffDuff: Stealing Pruned DNNs from Sparse Accelerators
Dingqing Yang, Prashant J. Nair, Mieszko Lis (Univ. of British Columbia)

OCCAMY: Elastically Sharing a SIMD Co-Processor Across Multiple CPU Cores
Zhongcheng Zhang (Inst. of Computing Technology, Chinese Academy of Sciences); Yan Ou (Turing Business Department, HiSilicon Technologies Company, Ltd); Ying Liu, Chenxi Wang (Inst. of Computing Technology, Chinese Academy of Sciences); Yongbin Zhou, Xiaoyu Wang (Turing Business Department, HiSilicon Technologies Company, Ltd); Yuyang Zhang, Yucheng Ouyang (Inst. of Computing Technology, Chinese Academy of Sciences); Jiahao Shan (unaffiliated); Ying Wang (Inst. of Computing Technology, Chinese Academy of Sciences); Jingling Xue (Univ. of New South Wales, Sydney); Huimin Cui, Xiaobing Feng (Inst. of Computing Technology, Chinese Academy of Sciences)
Session Chair: Mike Bond (Ohio State Univ.)
Efficient Compactions Between Storage Tiers with PrismDB
Ashwini Raina, Jianan Lu (Princeton Univ.); Asaf Cidon (Columbia Univ.); Michael J. Freedman (Princeton Univ.)

Persistent Memory Disaggregation for Cloud-Native Relational Databases
Chaoyi Ruan (Univ. of Science and Technology of China / Alibaba Cloud); Yingqiang Zhang (Alibaba Cloud); Chao Bi (Univ. of Science and Technology of China / Alibaba Cloud); Xiaosong Ma (Qatar Computing Research Inst., Hamad Bin Khalifa Univ.); Hao Chen, Feifei Li, Xinjun Yang (Alibaba Cloud); Cheng Li (Univ. of Science and Technology of China); Ashraf Aboulnaga (Qatar Computing Research Inst., Hamad Bin Khalifa Univ.); Yinlong Xu (Univ. of Science and Technology of China)

SpecPMT: Speculative Logging for Resolving Crash Consistency Overhead of Persistent Memory
Chencheng Ye (Huazhong Univ. of Science and Technology); Yuanchao Xu, Xipeng Shen (North Carolina State Univ.); Yan Sha, Xiaofei Liao, Hai Jin (Huazhong Univ. of Science and Technology); Yan Solihin (Univ. of Central Florida)

3:50 PM PDT – 5:10 PM PDT: Poster Session 1

Location: Junior Ballroom & Junior Foyer (3rd floor)

5:10 PM PDT – 6:10 PM PDT: WACI: Wild and Crazy Ideas

Location: Grand AB (GB level, below ground)

6:10 PM PDT – 7:10 PM PDT: Business Meeting

Location: Grand AB (GB level, below ground)


Day 2: Tuesday, March 28

8:00 AM PDT – 9:00 AM PDT: Breakfast

Location: Junior Ballroom & Pavillion Ballroom (3rd floor)

9:00 AM PDT – 10:00 AM PDT: Keynote 2 by Abhishek Bhattacharjee (Yale Univ.)

Abhishek Bhattacharjee headshot Abstract
Direct mind-machine teaming will help us treat brain disorders, augment the healthy brain, and shed light on how the brain as an organ gives rise to the mind. Delivering on this promise requires the design of computer systems that delicately balance the tight power, latency, and bandwidth trade-offs needed to decode brain activity, stimulate biological neurons, and control assistive devices most effectively.

This talk presents my group's design of a standardized and general computer architecture for future brain interfacing. Our design enables the treatment of several neurological disorders (most notably, epilepsy and movement disorders) and lays the groundwork for brain interfacing techniques that can help augment cognitive control and decision-making in the healthy brain. Central to our design is end-to-end hardware acceleration, from the microarchitectural to the distributed system level. Key insights are undergirded via detailed physical synthesis models and chip tape-outs in a 12nm CMOS process.


Bio
Abhishek Bhattacharjee is an Associate Professor of Computer Science at Yale University. His work on hardware optimizations for memory translation has influenced the design of TLBs in AMD CPUs, starting with the Zen 1 architecture, and in NVIDIA's GPUs, starting with the Ampere architecture. His work on software optimizations for memory translation has been shipped in the Linux OS since the 4.14 kernel. More recently, Abhishek has been building flexible and low-power architectures for brain-computer interfacing — the topic of this talk.

10:00 AM PDT – 10:20 AM PDT: Coffee Break

Location: Grand Foyer (GB level, below ground)

10:20 AM PDT – 12:00 PM PDT

Session Chair: Brandon Lucia (Carnegie Mellon Univ.)
Simulator Independent Coverage for RTL Hardware Languages
Kevin Laeufer, Vighnesh Iyer, David Biancolin, Jonathan Bachrach, Borivoje Nikolic, Koushik Sen (Univ. of California, Berkeley)

RepCut: Superlinear Parallel RTL Simulation with Replication-Aided Partitioning
Haoyuan Wang, Scott Beamer (Univ. of California, Santa Cruz)

SMAPPIC: Scalable Multi-FPGA Architecture Prototype Platform in the Cloud
Grigory Chirkov, David Wentzlaff (Princeton Univ.)

ShakeFlow: Functional Hardware Description with Latency-Insensitive Interface Combinators
Sungsoo Han, Minseong Jang, Jeehoon Kang (KAIST)

eHDL: Turning eBPF/XDP Programs into Hardware Designs for the NIC
Alessandro Rivitti (Axbryd / Tor Vergata Univ. of Rome); Roberto Bifulco (NEC Laboratories Europe); Angelo Tulumello (Axbryd / Tor Vergata Univ. of Rome); Marco Bonola (Axbryd); Salvatore Pontarelli (Sapienza Univ.)

CaT: A Solver-Aided Compiler for Packet-Processing Pipelines
Xiangyu Gao (New York Univ.); Divya Raghunathan, Ruijie Fang (Princeton Univ.); Tao Wang, Xiaotong Zhu, Anirudh Sivaraman (New York Univ.); Srinivas Narayana (Rutgers Univ.); Aarti Gupta (Princeton Univ.)
Session Chair: Jayneel Gandhi (Meta)
TeraHeap: Reducing Memory Pressure in Managed Big Data Frameworks
Iacovos G. Kolokasis, Giannos Evdorou (Univ. of Crete / ICS-FORTH); Shoaib Akram (Australian National Univ.); Christos Kozanitis (ICS-FORTH); Anastasios Papagiannis (Isovalent); Foivos Zakkak (Red Hat, Inc.); Polyvios Pratikakis, Angelos Bilas (Univ. of Crete / ICS-FORTH)

Copy-on-Pin: The Missing Piece for Correct Copy-on-Write
David Hildenbrand, Martin Schulz (Technical Univ. of Munich); Nadav Amit (VMware Research Group)

Mosaic Pages: Big TLB Reach with Small Pages
Krishnan Gosakan (Rutgers Univ.); Jaehyun Han (Univ. of North Carolina at Chapel Hill); William Kuszmaul (Massachusetts Inst. of Technology); Ibrahim Nael Mubarek, Nirjhar Mukherjee (Carnegie Mellon Univ.); Karthik Sriram (Yale Univ.); Guido Tagliavini (Rutgers Univ.); Evan West, Michael Bender (Stony Brook Univ.); Abhishek Bhattacharjee (Yale Univ.); Alex Conway (VMware Research); Martin Farach-Colton (Rutgers Univ.); Jayneel Gandhi (Meta); Rob Johnson (VMware Research); Sudarsun Kannan (Rutgers Univ.); Donald Porter (Univ. of North Carolina at Chapel Hill)

Revisiting Log-structured Merging for KV Stores in Hybrid Memory Systems
Zhuohui Duan, Jiabo Yao, Haikun Liu, Xiaofei Liao, Hai Jin, Yu Zhang (Huazhong Univ. of Science and Technology)

ABNDP: Co-Optimizing Data Access and Load Balance in Near-Data Processing
Boyu Tian, Qihang Chen, Mingyu Gao (Tsinghua Univ.)

Infinity Stream: Portable and Programmer-Friendly In-/Near-Memory Fusion
Zhengrong Wang, Christopher Liu (Univ. of California, Los Angeles); Aman Arora, Lizy John (Univ. of Texas at Austin); Tony Nowatzki (Univ. of California, Los Angeles)
Session Chair: Dimitrios Skarlatos (Carnegie Mellon Univ.)
Flexagon: A Multi-Dataflow Sparse-Sparse Matrix Multiplication Accelerator for Efficient DNN Processing
Francisco Munoz-Martinez (Universidad de Murcia); Raveesh Garg (Georgia Inst. of Technology); Michael Pellauer (NVIDIA); José L. Abellan, Manuel E. Acacio (Universidad de Murcia); Tushar Krishna (Georgia Inst. of Technology)

Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling
Toluwanimi O. Odemuyiwa (Univ. of California, Davis); Hadi Asghari-Moghaddam (Univ. of Illinois Urbana-Champaign / Meta); Michael Pellauer (NVIDIA); Kartik Hegde (Univ. of Illinois Urbana-Champaign); Po-An Tsai, Neal Crago, Aamer Jaleel (NVIDIA); John D. Owens (Univ. of California, Davis); Edgar Solomonik (Univ. of Illinois Urbana-Champaign); Joel Emer (Massachusetts Inst. of Technology / NVIDIA); Christopher Fletcher (Univ. of Illinois Urbana-Champaign)

SPADA: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow
Zhiyao Li (Tsinghua Univ.); Jiaxiang Li (Northwestern Univ.); Taijie Chen (Tsinghua Univ.); Dimin Niu, Hongzhong Zheng, Yuan Xie (Alibaba DAMO Academy); Mingyu Gao (Tsinghua Univ. / Shanghai Qi Zhi Inst.)

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Zihao Ye (Univ. of Washington); Ruihang Lai (Carnegie Mellon Univ.); Junru Shao (OctoML); Tianqi Chen (Carnegie Mellon Univ.); Luis Ceze (Univ. of Washington)

Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs
Yaoyao Ding (Univ. of Toronto / Vector Inst.); Cody Hao Yu (Amazon Web Services); Bojian Zheng (Univ. of Toronto / Vector Inst.); Yizhi Liu, Yida Wang (Amazon Web Services); Gennady Pekhimenko (Univ. of Toronto / Vector Inst.)

The Sparse Abstract Machine
Olivia Hsu, Maxwell Strange, Ritvik Sharma (Stanford Univ.); Jaeyeon Won (Massachusetts Inst. of Technology); Kunle Olukotun (Stanford Univ.); Joel Emer (Massachusetts Inst. of Technology / NVIDIA); Mark A Horowitz, Fredrik Kjolstad (Stanford Univ.)

12:00 PM PDT – 1:40 PM PDT: Lunch

Location: Junior Ballroom & Pavillion Ballroom (3rd floor)

1:40 PM PDT – 3:20 PM PDT

Session Chair: Pedro Fonseca (Purdue Univ.)
Compilation Consistency Modulo Debug Information
Theodore Luo Wang, Yongqiang Tian, Yiwen Dong, Zhenyang Xu, Chengnian Sun (Univ. of Waterloo)

Where Did My Variable Go? Poking Holes in Incomplete Debug Information
Cristian Assaiante, Daniele Cono D'Elia, Giuseppe Antonio Di Luna, Leonardo Querzoni (Sapienza Univ. of Rome)

Protecting Data Integrity of Web Applications With Database Constraints Inferred From Application Code
Haochen Huang, Bingyu Shen, Li Zhong, Yuanyuan Zhou (Univ. of California, San Diego)

VClinic: A Portable and Efficient Framework for Fine-grained Value Profilers
Xin You, Hailong Yang, Kelun Lei, Zhongzhi Luan, Depei Qian (Beihang Univ.)

Vidi: Record Replay for Reconfigurable Hardware
Gefei Zuo, Jiacheng Ma (Univ. of Michigan); Andrew Quinn (Univ. of California, Santa Cruz); Baris Kasikci (Univ. of Michigan / Google)

DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications
Mao Lin (Univ. of California, Merced); Keren Zhou (Rice Univ.); Pengfei Su (Univ. of California, Merced)
Session Chair: Huaicheng Li (Virginia Tech)
Disaggregated RAID Storage in Modern Datacenters
Junyi Shu, Ruidong Zhu, Yun Ma, Gang Huang, Hong Mei, Xuanzhe Liu, Xin Jin (Peking Univ.)

Reconfigurable Virtual Memory for FPGA-Driven I/O
Joshua Landgraf, Matthew Giordano, Esther Yoon (Univ. of Texas at Austin); Christopher J. Rossbach (Univ. of Texas at Austin / Katana Graph)

LeaFTL: A Learning-based Flash Translation Layer for Solid-State Drives
Jinghan Sun, Shaobo Li (Univ. of Illinois Urbana-Champaign); Yunxin Sun (ETH Zürich); Chao Sun, Dejan Vucinic (Western Digital Research); Jian Huang (Univ. of Illinois Urbana-Champaign)

RAIZN: Redundant Array of Independent Zoned Namespaces
Thomas Kim, Jekyeom Jeon, Nikhil Arora, Huaicheng Li (Carnegie Mellon Univ.); Michael Kaminsky (BrdgAI / Carnegie Mellon Univ.); David Andersen, Gregory R. Ganger, George Amvrosiadis (Carnegie Mellon Univ.); Matias Bjørling (Western Digital)

Re-architecting I/O Caches for Emerging Fast Storage Devices
Mohammadamin Ajdari (HPDS Research / Sharif Univ. of Technology); Pouria Peykani Sani, Amirhossein Moradi, Masoud Khanalizadeh Imani (Sharif Univ. of Technology); Amir Hossein Bazkhanei (HPDS Research / Sharif Univ. of Technology); Hossein Asadi (Sharif Univ. of Technology)

Prism: Optimizing Key-Value Store for Modern Heterogeneous Storage Devices
Yongju Song (Sungkyunkwan Univ.); Wook-Hee Kim (Konkuk Univ.); Sumit Kumar Monga, Changwoo Min (Virginia Tech); Young Ik Eom (Sungkyunkwan Univ.)
Session Chair: Alexandros Daglis (Georgia Inst. of Technology)
Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter Networks
Tushar Swamy (Stanford Univ.); Annus Zulfiqar (Purdue Univ.); Luigi Nardi (Lund Univ. / Stanford Univ.); Muhammad Shahbaz (Purdue Univ.); Kunle Olukotun (Stanford Univ.)

TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Siyuan Feng (Shanghai Jiao Tong Univ.); Bohan Hou, Hongyi Jin (Carnegie Mellon Univ.); Wuwei Lin, Junru Shao (OctoML); Ruihang Lai (Carnegie Mellon Univ.); Zihao Ye (Univ. of Washington); Lianmin Zheng (Univ. of California, Berkeley); Cody Hao Yu (Amazon Web Services); Yong Yu (Shanghai Jiao Tong Univ.); Tianqi Chen (Carnegie Mellon Univ. / OctoML)

FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao (Georgia Inst. of Technology); Suvinay Subramanian (Google); Gaurav Agrawal (Microsoft); Amir Yazdanbakhsh (Google Research, Brain Team); Tushar Krishna (Georgia Inst. of Technology)

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning
Yi Zhai, Yu Zhang, Shuo Liu, Xiaomeng Chu, Jie Peng, Jianmin Ji, Yanyong Zhang (Univ. of Science and Technology of China)

Mobius: Fine Tuning Large-scale Models on Commodity GPU Servers
Yangyang Feng, Minhui Xie, Zijie Tian, Shuo Wang, Youyou Lu, Jiwu Shu (Tsinghua Univ.)

Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning
Shuangyan Yang (Univ. of California, Merced); Minjia Zhang (Microsoft Research); Wenqian Dong, Dong Li (Univ. of California, Merced)

3:20 PM PDT – 3:40 PM PDT: Coffee Break

Location: Grand Foyer (GB level, below ground)

3:40 PM PDT – 4:30 PM PDT

Session Chair: Brandon Reagen (New York Univ. / Meta)
Qompress: Efficient Compilation for Ququarts Exploiting Partial and Mixed Radix Operations for Communication Reduction
Andrew Litteken, Lennart Maximilian Seifert, Jason Chadwick, Natalia Nottingham, Frederick T Chong, Jonathan Baker (Univ. of Chicago)

Verification of nondeterministic quantum programs
Yuan Feng, Yingte Xu (Univ. of Technology Sydney)

CAFQA: A classical simulation bootstrap for variational quantum algorithms
Gokul Subramanian Ravi (Univ. of Chicago); Pranav Gokhale (Super.tech); Yi Ding (Massachusetts Inst. of Technology); William Kirby (Tufts Univ.); Kaitlin Smith, Jonathan Baker (Univ. of Chicago); Peter J. Love (Tufts Univ.); Henry Hoffmann (Univ. of Chicago); Kenneth R. Brown (Duke Univ.); Frederic Chong (Univ. of Chicago)
Session Chair: Sergey Blagodurov (AMD)
In-Network Aggregation with Transport Transparency for Distributed Training
Shuo Liu, Qiaoling Wang, Junyi Zhang (Huawei Technologies Co., Ltd.); Wenfei Wu (Peking Univ.); Qinliang Lin (Huawei Technologies Co., Ltd.); Yao Liu (Sun Yat-sen Univ.); Meng Xu (Huawei Technologies Co., Ltd.); Marco Canini (King Abdullah Univ. of Science and Technology); Ray C. C. Cheung, Jianfei He (City Univ. of Hong Kong)

Rosebud: Making FPGA-accelerated Middlebox Development More Pleasant
Moein Khazraee (Massachusetts Inst. of Technology); Alex Forencich, George C. Papen (Univ. of California, San Diego); Alex C. Snoeren (UC San Diego / Google); Aaron Schulman (Univ. of California, San Diego)

Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
Mark Sutherland, Babak Falsafi (EcoCloud / EPFL); Alexandros Daglis (Georgia Inst. of Technology)
Session Chair: Chris Rossbach (Univ. of Texas at Austin / Katana Graph)
Glign: Taming Misaligned Graph Traversals in Concurrent Graph Processing
Xizhe Yin, Zhijia Zhao, Rajiv Gupta (Univ. of California, Riverside)

NosWalker: A Decoupled Architecture for Out-of-Core Random Walk Processing
Shuke Wang, MingXing Zhang (Tsinghua Univ.); Ke Yang (Tsinghua Univ. / Beijing HaiZhi XingTu Technology Co., Ltd.); Kang Chen, Shaonan Ma, Jinlei Jiang, Yongwei Wu (Tsinghua Univ.)

DecoMine: A Compilation-based Graph Pattern Mining System with Pattern Decomposition
Jingji Chen, Xuehai Qian (Purdue Univ.)

4:30 PM PDT – 6:00 PM PDT: Poster Session 2

Location: Junior Ballroom & Junior Foyer (3rd floor)

6:30 PM PDT – 11:00 PM PDT: Excursion, Banquet, and Awards: Vancouver Aquarium

  • Buses depart starting at 5:45 PM
  • Awards Ceremony: 7:00 PM – 7:30 PM in Upper Teck


Day 3: Wednesday, March 29

9:00 AM PDT – 10:00 AM PDT: Keynote 3 by Bryan Catanzaro (NVIDIA)

Bryan Catanzaro headshot Abstract
ChatGPT recently became one of the fastest growing new applications in history, thanks to its intriguing text generation capabilities that are able to answer questions, write poetry, and even problem solve. Large Language Models are now being integrated in fundamental ways into products around the tech industry. The possibilities are extraordinary, but much research remains to make these systems reliable and trustworthy, as well as integrate them into applications seamlessly. Additionally, the computational challenges behind large language modeling are also quite important. Systems for training and deploying these models must be highly scalable and run at extreme efficiency, because the amount of work necessary to converge a model can be extraordinarily large. The cost of deploying these models is a barrier to their deployment and must be lowered significantly. In this talk, I'll discuss the work we have been doing at NVIDIA to optimize systems for Large Language Model training and inference, and highlight some of the challenges that remain for future work.


Bio
Bryan Catanzaro is Vice President of Applied Deep Learning Research at NVIDIA, where he leads a team of AI researchers working on chip design, audio and speech, language modeling, graphics and vision, with the goal of finding practical new ways to use AI for NVIDIA's products and workflows. DLSS, Megatron, CUDNN, Pascaline, WaveGlow and DeepSpeech are some of the projects he's helped create. Bryan received his PhD in EECS from the University of California, Berkeley.

10:00 AM PDT – 10:20 AM PDT: Coffee Break

Location: Grand Foyer (GB level, below ground)

10:20 AM PDT – 12:00 PM PDT

Session Chair: Yatin Manerkar (Univ. of Michigan)
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song (Yonsei Univ.); Jinkyu Yim (Seoul National Univ.); Jaewon Jung (Yonsei Univ.); Hongsun Jang (Seoul National Univ.); Hyung-Jin Kim (Samsung Electronics); Youngsok Kim (Yonsei Univ.); Jinho Lee (Seoul National Univ.)

DPACS: Hardware Accelerated Dynamic Neural Network Pruning through Algorithm-Architecture Co-design
Yizhao Gao, Baoheng Zhang, Xiaojuan Qi, Hayden Kwok-Hay So (Univ. of Hong Kong)

Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
Qinghao Hu, Meng Zhang (Nanyang Technological Univ.); Peng Sun (SenseTime); Yonggang Wen, Tianwei Zhang (Nanyang Technological Univ.)

ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning
Diandian Gu, Yihao Zhao, Yinmin Zhong (Peking Univ.); Yifan Xiong, Zhenhua Han, Peng Cheng, Fan Yang (Microsoft Research); Gang Huang, Xin Jin, Xuanzhe Liu (Peking Univ.)

Hyperscale Hardware Optimized Neural Architecture Search
Sheng Li, Garrett Andersen, Tao Chen, Liqun Cheng, Julian Grady, Da Huang, Quoc Le, Andrew Li, Xin Li, Yang Li, Chen Liang, Yifeng Lu, Yun Ni (Google); Ruoming Pang (Apple); Mingxing Tan (Waymo); Martin Wicke, Gang Wu, Shengqi Zhu, Parthasarathy Ranganathan, Norman P. Jouppi (Google)

DeepUM: Tensor Migration and Prefetching in Unified Memory
Jaehoon Jung (Moreh Inc.); Jinpyo Kim, Jaejin Lee (Seoul National Univ.)
Session Chair: Nathan Dautenhahn (Rice Univ.)
Decker: Attack Surface Reduction via On-demand Code Mapping
Chris Porter, Sharjeel Khan, Santosh Pande (Georgia Inst. of Technology)

Finding Unstable Code via Compiler-driven Differential Testing
Shaohua Li, Zhendong Su (ETH Zürich)

Going Beyond the Limits of SFI: Flexible Hardware-Assisted In-Process Isolation with HFI
Shravan Narayan, Tal Garfinkel (Univ. of California, San Diego); ‪Mohammadkazem Taram‬ (Purdue Univ.); Joey Rudek, Daniel Moghimi, Evan Johnson (Univ. of California, San Diego); Chris Fallin (Fastly); Anjo Vahldiek-Oberwagner, Michael LeMay (Intel Labs); Ravi Sahita (Rivos); Dean Tullsen, Deian Stefan (Univ. of California, San Diego)

Protect the System Call, Protect (most of) the World with BASTION
Christopher Jelesnianski, Mohannad Ismail (Virginia Tech); Yeongjin Jang (Oregon State Univ.); Dan Williams, Changwoo Min (Virginia Tech)

Characterizing and Optimizing End-to-End Systems for Private Inference
Karthik Garimella (New York Univ.); Zahra Ghodsi (Univ. of California, San Diego); Nandan Kumar Jha, Siddharth Garg, Brandon Reagen (New York Univ.)

GZKP: A GPU Accelerated Zero-Knowledge Proof System
Weiliang Ma, Xuanhua Shi, Qian Xiong (Huazhong Univ. of Science and Technology); Xiaosong Ma (Qatar Computing Research Inst., Hamad Bin Khalifa Univ.); Hai Jin, Haozhao Kuang (Huazhong Univ. of Science and Technology); Mingyu Gao (Tsinghua Univ.); Ye Zhang, Haichen Shen (Scroll Tech); Weifang Hu (Huazhong Univ. of Science and Technology)
Session Chair: Shih-Wei Li (National Taiwan Univ.)
A Prediction System Service
Zhizhou Zhang, Alvin Oliver Glova, Timothy Sherwood, Jonathan Balkind (Univ. of California, Santa Barbara)

Exit-less, Isolated, and Shared Access for Virtual Machines
Kenichi Yasukata, Hajime Tazaki, Pierre-Louis Aublin (IIJ Research Laboratory)

VDom: Fast and Unlimited Virtual Domains on Multiple Architectures
Ziqi Yuan, Siyu Hong, Rui Chang, Yajin Zhou, Wenbo Shen, Kui Ren (Zhejiang Univ.)

KIT: Testing OS-level Virtualization for Functional Interference Bugs
Congyu Liu, Sishuai Gong, Pedro Fonseca (Purdue Univ.)

Efficient Scheduler Live Update for Linux Kernel with Modularization
Teng Ma, Shanpei Chen, Yihao Wu, Erwei Deng, Zhuo Song (Alibaba Group); Quan Chen, Minyi Guo (Shanghai Jiao Tong Univ.)

Towards a Machine Learning-Assisted Kernel with LAKE
Henrique Fingler, Isha Tarte (Univ. of Texas at Austin); Hangchen Yu (Meta); Ariel Szekely (Massachusetts Inst. of Technology); Bodun Hu, Aditya Akella (Univ. of Texas at Austin); Christopher Rossbach (Univ. of Texas at Austin / Katana Graph)

12:00 PM PDT – 1:00 PM PDT: Lunch

Location: Junior Ballroom & Pavillion Ballroom (3rd floor)

1:00 PM PDT – 2:05 PM PDT

Session Chair: Sara Achour (Stanford Univ.)
Better Than Worst-Case Decoding for Quantum Error Correction
Gokul Subramanian Ravi, Jonathan Baker (Univ. of Chicago); Arash Fayyazi (Univ. of Southern California); Sophia Fuhui Lin (Univ. of Chicago); Ali Javadi-Abhari (IBM); Massoud Pedram (Univ. of Southern California); Frederic T. Chong (Univ. of Chicago)

Navigating the Dynamic Noise Landscape of Variational Quantum Algorithms with QISMET
Gokul Subramanian Ravi, Kaitlin Smith, Jonathan Baker, Tejas Kannan (Univ. of Chicago); Nate Earnest, Ali Javadi-Abhari (IBM); Henry Hoffmann, Frederic Chong (Univ. of Chicago)

FrozenQubits: Boosting Fidelity of QAOA by Skipping Hotspot Nodes
Ramin Ayanzadeh, Narges Alavisamani, Poulami Das, Moinuddin Qureshi (Georgia Inst. of Technology)

CaQR: A Compiler-assisted Approach for Qubit Reuse Through Dynamic Circuit
Fei Hua, Yuwei Jin, Yanhao Chen (Rutgers Univ.); Suhas Vittal (Georgia Inst. of Technology); Kevin Krsuli, Lev S. Bishop, John Lapeyre, Ali Javadi-Abhari (IBM); Eddy Z. Zhang (Rutgers Univ.)
Session Chair: Mark Sutherland (Oracle)
MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation
Samuel Hsia, Udit Gupta (Harvard Univ. / Meta AI); Bilge Acun, Newsha Ardalani, Pan Zhong (Meta AI); Gu-Yeon Wei (Harvard Univ. / Samsung); David Brooks (Harvard Univ. / Meta AI); Carole-Jean Wu (Meta AI)

Sigma: Compiling Einstein Summations to Locality-Aware Dataflow
Tian Zhao, Alexander Rucker, Kunle Olukotun (Stanford Univ.)

APEX: A Framework for Automated Processing Element Design Space Exploration using Frequent Subgraph Analysis
Jackson Melchert, Kathleen Feng, Caleb Donovick, Ross Daly, Ritvik Sharma, Clark Barrett, Mark Horowitz, Pat Hanrahan, Priyanka Raina (Stanford Univ.)

Stepwise Debugging for Hardware Accelerators
Griffin Berlstein, Rachit Nigam (Cornell Univ.); Chris Gyurgyik (Google); Adrian Sampson (Cornell Univ.)
Session Chair: Tony Gutierrez (AMD Research)
CommonGraph: Graph Analytics on Evolving Data
Mahbod Afarin, Chao Gao, Shafiur Rahman, Nael Abu-Ghazaleh, Rajiv Gupta (Univ. of California, Riverside)

uGrapher: High-performance Graph Operator Computation via Unified Abstraction for Graph Neural Networks
Yangjie Zhou, Jingwen Leng, Yaoxu Song, Shuwen Lu, Mian Wang, Chao Li, Minyi Guo (Shanghai Jiao Tong Univ.); Wenting Shen, Yong Li, Wei Lin, Xiangwen Liu, Hanqing Wu (Alibaba Group)

Khuzdul: Efficient and Scalable Distributed Graph Pattern Mining Engine
Jingji Chen, Xuehai Qian (Purdue Univ.)

Achieving Sub-second Pairwise Query over Evolving Graphs
Hongtao Chen, Mingxing Zhang (Tsinghua Univ.); Ke Yang (Tsinghua Univ. / Beijing HaiZhi XingTu Technology Co., Ltd.); Kang Chen (Tsinghua Univ.); Albert Zomaya (Univ. of Sydney); Yongwei Wu (Tsinghua Univ.); Xuehai Qian (Purdue Univ.)

2:10 PM PDT – 3:00 PM PDT

Session Chair: Scott Beamer (Univ. of California, Santa Cruz)
Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale
Padmapriya Duraisamy, Wei Xu, Scott Hare, Ravi Rajwar (Google); David Culler (Google / UC Berkeley); Zhiyi Xu, Jianing Fan, Chris Kennelly, Bill McCloskey, Danijela Mijailovic, Brian Morris, Chiranjit Mukherjee, Jingliang Ren, Greg Thelen, Paul Turner, Carlos Villavieja, Parthasarathy Ranganathan, Amin Vahdat (Google)

TPP: Transparent Page Placement for CXL-Enabled Tiered Memory
Hasan Al Maruf (Univ. of Michigan); Hao Wang, Abhishek Dhanotia, Johannes Weiner (Meta); Niket Agarwal (NVIDIA); Pallab Bhattacharya, Chris Petersen (Meta); Mosharaf Chowdhury (Univ. of Michigan); Shobhit Kanaujia, Prakash Chauhan (Meta)

Pond: CXL-Based Memory Pooling Systems for Cloud Platforms
Huaicheng Li (Virginia Tech / Carnegie Mellon Univ.); Daniel S. Berger (Microsoft Azure / Univ. of Washington); Lisa Hsu (unaffiliated); Dan Ernst, Pantea Zardoshti (Microsoft Azure); Stanko Novakovic (Google); Monish Shah, Samir Rajadnya (Microsoft Azure); Scott Lee (Microsoft); Ishwar Agarwal (Intel); Mark D. Hill (Microsoft Azure / Univ. of Wisconsin-Madison); Marcus Fontoura (Stone Co); Ricardo Bianchini (Microsoft Azure)
Session Chair: Haris Volos (Univ. of Cyprus)
uBFT: Microsecond-scale BFT using Disaggregated Memory
Marcos K. Aguilera, Naama Ben-David (VMware Research); Rachid Guerraoui, Antoine Murat, Athanasios Xygkis (EPFL); Igor Zablotchi (Massachusetts Inst. of Technology)

Compiling Distributed System Models with PGo
Finn Hackett, Shayan Hosseini, Renato Costa, Matthew Do, Ivan Beschastnikh (Univ. of British Columbia)

Propeller: A Profile Guided, Relinking Optimizer for Warehouse Scale Applications
Han Shen, Krzysztof Pszeniczny, Rahman Lavaee, Snehasish Kumar, Sriraman Tallam, Xinliang (David) Li (Google)
Session Chair: Lluis Vilanova (Imperial College London)
AfterImage: Leaking Control Flow and Tracking Load Operations via the Hardware Prefetcher
Yun Chen, Lingfeng Pei, Trevor E. Carlson (National Univ. of Singapore)

Hacky Racers: Exploiting Instruction-Level Parallelism to Generate Stealthy Fine-Grained Timers
Haocheng Xiao, Sam Ainsworth (Univ. of Edinburgh)

Untangle: A Principled Framework to Design Low-Leakage, High-Performance Dynamic Partitioning Schemes
Zirui Neil Zhao (Univ. of Illinois Urbana-Champaign); Adam Morrison (Tel Aviv Univ.); Christopher W. Fletcher, Josep Torrellas (Univ. of Illinois Urbana-Champaign)

3:00 PM PDT – 3:15 PM PDT: Closing Remarks

Location: Grand AB (GB level, below ground)