DeepSniffer: A DNN Model Extraction Framework Based on Learning Architectural Hints

Session: Frameworks for deep learning--Layering the ML cake.

Authors: Hu (University of California, Santa Barbara); Ling Liang (University of California, Santa Barbara); Shuangchen Li (University of California, Santa Barbara); Lei Deng (University of California, Santa Barbara & Tsinghua University); Pengfei Zuo (University of California, Santa Barbara & Huazhong University of Science and Technology); Yu Ju (University of California, Santa Barbara & Tsinghua University); Xinfeng Xie (University of California, Santa Barbara); Yufei Ding (University of California, Santa Barbara); Chang Liu (Citadel Securities); Timothy Sherwood (University of California, Santa Barbara); Yuan Xie (University of California, Santa Barbara)

As deep neural networks (DNNs) continue their reach into a wide range of application domains, the neural network architecture of DNN models becomes an increasingly sensitive subject, due to either intellectual property protection or risks of adversarial attacks. Previous studies explore to leverage architecture-level events disposed in hardware platforms to extract the model architecture information. They pose the following limitations: requiring a priori knowledge of victim models, lacking in robustness and generality, or obtaining incomplete information of the victim model architecture. Our paper proposes DeepSniffer, a learning-based model extraction framework to obtain the complete model architecture information without any prior knowledge of the victim model. It is robust to architectural and system noises introduced by the complex memory hierarchy and diverse run-time system optimizations. The basic idea of DeepSniffer is to learn the relation between extracted architectural hints (e.g., volumes of memory reads/writes obtained by side-channel or bus snooping attacks) and model internal architectures. Taking GPU platforms as a show case, DeepSniffer conducts model extraction by learning both the architecture-level execution features of kernels and the inter-layer temporal association information introduced by the common practice of DNN design. We demonstrate that DeepSniffer works experimentally in the context of an off-the-shelf Nvidia GPU platform running a variety of DNN models. The extracted models are directly helpful to the attempting of crafting adversarial inputs. Our experimental results show that DeepSniffer achieves a high accuracy of model extraction and thus improves the adversarial attack success rate from 14.6%$\sim$25.5% (without network architecture knowledge) to 75.9% (with extracted network architecture). The DeepSniffer project has been released in Github.