Centauri: Enabling Efficient Scheduling for Communication-Computation Overlap in Large Model Training via Communication Partitioning Chang Chen, Xiuhong Li, and Qianchao Zhu (Peking University); Jiangfei Duan (Chinese University of Hong Kong); Peng Sun and Xingcheng Zhang (Shanghai AI Lab); Chao Yang (Peking University) |
GIANTSAN: Efficient Memory Sanitization with Segment Folding Hao Ling (The Hong Kong University of Science and Technology); Heqing Huang (City University of Hong Kong); Chengpeng Wang, Yuandao Cai, and Charles Zhang (The Hong Kong University of Science and Technology) |
Automatic Generation of Vectorizing Compilers for Customizable Digital Signal Processors Samuel Thomas and James Bornholt (University of Texas at Austin) |
CSSTs: A Dynamic Data Structure for Partial Orders in Concurrent Execution Analysis Hünkar Can Tunç (Aarhus University);Ameya Prashant Deshmukh (Indian Institute of Technology Bombay); Berk Cirisci (Amazon Web Services); Constantin Enea (Ecole Polytechnique); Andreas Pavlogiannis (Aarhus University) |
PDIP: Priority Directed Instruction Prefetching Bhargav Reddy Godala (Princeton University);Sankara Prasad Ramesh (University of California San Diego); Gilles A. Pokam, Jared Stark, and Andre Seznec (Intel); Dean Tullsen (University of California San Diego); David I. August (Princeton University) |
ngAP: Non-blocking Large-scale Automata Processing on GPUs Tianao Ge (Hong Kong University of Science and Technology Guangzhou); Tong Zhang (Samsung); Hongyuan Liu (Hong Kong University of Science and Technology Guangzhou) |