Search CORE

12 research outputs found

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

Author: Chandra Vikas
Chang Ernie
Iandola Forrest N.
Lai Liangzhen
Li Yang
Shangguan Yuan
Shi Yangyang
Publication venue
Publication date: 21/09/2023
Field of study

Transformer-based models excel in speech recognition. Existing efforts to optimize Transformer inference, typically for long-context applications, center on simplifying attention score calculations. However, streaming speech recognition models usually process a limited number of tokens each time, making attention score calculation less of a bottleneck. Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage. To address this bottleneck, we propose folding attention, a technique targeting these linear layers, significantly reducing model size and improving memory and power efficiency. Experiments on on-device Transformer-based streaming speech recognition models show that folding attention reduces model size (and corresponding memory consumption) by up to 24% and power consumption by up to 23%, all without compromising model accuracy or computation overhead

arXiv.org e-Print Archive

Neural Network Model Extraction Attacks in Edge Devices by Hearing Architectural Hints

Author: Chen Tianqi
Cisse Moustapha
Dey M.
Goodfellow Ian J
Hua Weizhe
Huang Andrew
Iandola Forrest N
Papernot Nicolas
Rhu M.
Sutskever Ilya
Yan Mengjia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/03/2019
Field of study

As neural networks continue their reach into nearly every aspect of software operations, the details of those networks become an increasingly sensitive subject. Even those that deploy neural networks embedded in physical devices may wish to keep the inner working of their designs hidden -- either to protect their intellectual property or as a form of protection from adversarial inputs. The specific problem we address is how, through heavy system stack, given noisy and imperfect memory traces, one might reconstruct the neural network architecture including the set of layers employed, their connectivity, and their respective dimension sizes. Considering both the intra-layer architecture features and the inter-layer temporal association information introduced by the DNN design empirical experience, we draw upon ideas from speech recognition to solve this problem. We show that off-chip memory address traces and PCIe events provide ample information to reconstruct such neural network architectures accurately. We are the first to propose such accurate model extraction techniques and demonstrate an end-to-end attack experimentally in the context of an off-the-shelf Nvidia GPU platform with full system stack. Results show that the proposed techniques achieve a high reverse engineering accuracy and improve the one's ability to conduct targeted adversarial attack with success rate from 14.6\%

\sim

25.5\% (without network architecture knowledge) to 75.9\% (with extracted network architecture)

arXiv.org e-Print Archive

Crossref

Recommended from our members

Representing Range Compensators with Computational Geometry in TOPAS

Author: /Illinois U. Urbana /SLAC
Iandola Forrest N.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 07/09/2012
Field of study

In a proton therapy beamline, the range compensator modulates the beam energy, which subsequently controls the depth at which protons deposit energy. In this paper, we introduce two computational representations of range compensator. One of our compensator representations, which we refer to as a subtraction solid-based range compensator, precisely represents the compensator. Our other representation, the 3D hexagon-based range compensator, closely approximates the compensator geometry. We have implemented both of these compensator models in a proton therapy Monte Carlo simulation called TOPAS (Tool for Particle Simulation). In the future, we will present a detailed study of the accuracy and runtime performance trade-offs between our two range compensator representations

UNT Digital Library

Multi-UAV Trajectory Optimization and Deep Learning-Based Imagery Analysis for a UAS-Based Inventory Tracking Solution

Author: Iandola Forrest N
Li Hui
Raoui-Outach Rizlene
Ren Shaoqing
Simonyan Karen
Ye Qixiang
Zimmerman Thomas Guthrie
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date: 01/01/2019
Field of study

Copyright © 2019 by the American Institute of Aeronautics and AstronauticsDOI: 10.2514/6.2019-1569This paper presents a multi-UAV trajectory optimization and an imagery analysis technique based on Convolutional Neural Networks (CNN) for an inventory tracking solution using a UAS platform in a large warehouse or manufacturing environment. The current inventory tracking method is a manual and time-consuming process to scan all the inventory items. Its accuracy is not consistent depending on the complexity of the scanning environment. To improve the scanning efficiency with respect to time and accuracy, this paper discusses a UAS-based inventory solution. In particular, this paper addresses two primary topics: multi-UAV trajectory optimization to scan inventory items and a multi-layer CNN architecture to identify a tag attached on the inventory item. To demonstrate the proposed multi-UAV trajectory optimization framework, numerical simulations are conducted in a representative inventory space. The proposed CNN-based imagery analysis framework is demonstrated on a flight experiment

Scholarly Materials And Research @ Georgia Tech

Crossref

UbiEar

Author: Dixon Simon
Han Song
Heittola Toni
Iandola Forrest N
Khalil Adel
Lafay Gregoire
Leeo Inc.
Liaw Andy
Lin Min
Liu Baoyuan
Lokhande Nitin N
Mesaros Annamaria
Paulus Jouni
Rodgers Joseph Lee
van der Maaten Laurens
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

EVA: An Encrypted Vector Arithmetic Language and Compiler for Efficient Homomorphic Computation

Author: A
ALCHEMY
Approximate Homomorphic Encryption A Full RNS
Chen Hao
Cyphers Scott
Google Inc. [n.d.]. Protocol Buffer.
Graph Intel
Hastings Marcella
Iandola Forrest N.
Juvekar Chiraag
Low
Matrix Computation Secure Outsourced
SEAL
Security Standard Homomorphic Encryption
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/06/2020
Field of study

Fully-Homomorphic Encryption (FHE) offers powerful capabilities by enabling secure offloading of both storage and computation, and recent innovations in schemes and implementations have made it all the more attractive. At the same time, FHE is notoriously hard to use with a very constrained programming model, a very unusual performance profile, and many cryptographic constraints. Existing compilers for FHE either target simpler but less efficient FHE schemes or only support specific domains where they can rely on expert-provided high-level runtimes to hide complications.This paper presents a new FHE language called Encrypted Vector Arithmetic (EVA), which includes an optimizing compiler that generates correct and secure FHE programs, while hiding all the complexities of the target FHE scheme. Bolstered by our optimizing compiler, programmers can develop efficient general-purpose FHE applications directly in EVA. For example, we have developed image processing applications using EVA, with a very few lines of code.EVA is designed to also work as an intermediate representation that can be a target for compiling higher-level domain-specific languages. To demonstrate this, we have re-targeted CHET, an existing domain-specific compiler for neural network inference, onto EVA. Due to the novel optimizations in EVA, its programs are on average 5.3x faster than those generated by CHET. We believe that EVA would enable a wider adoption of FHE by making it easier to develop FHE applications and domain-specific FHE compilers

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Crossref