29 research outputs found
Deformable Part Models are Convolutional Neural Networks
Deformable part models (DPMs) and convolutional neural networks (CNNs) are
two widely used tools for visual recognition. They are typically viewed as
distinct approaches: DPMs are graphical models (Markov random fields), while
CNNs are "black-box" non-linear classifiers. In this paper, we show that a DPM
can be formulated as a CNN, thus providing a novel synthesis of the two ideas.
Our construction involves unrolling the DPM inference algorithm and mapping
each step to an equivalent (and at times novel) CNN layer. From this
perspective, it becomes natural to replace the standard image features used in
DPM with a learned feature extractor. We call the resulting model DeepPyramid
DPM and experimentally validate it on PASCAL VOC. DeepPyramid DPM significantly
outperforms DPMs based on histograms of oriented gradients features (HOG) and
slightly outperforms a comparable version of the recently introduced R-CNN
detection system, while running an order of magnitude faster
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Transformer-based models excel in speech recognition. Existing efforts to
optimize Transformer inference, typically for long-context applications, center
on simplifying attention score calculations. However, streaming speech
recognition models usually process a limited number of tokens each time, making
attention score calculation less of a bottleneck. Instead, the bottleneck lies
in the linear projection layers of multi-head attention and feedforward
networks, constituting a substantial portion of the model size and contributing
significantly to computation, memory, and power usage.
To address this bottleneck, we propose folding attention, a technique
targeting these linear layers, significantly reducing model size and improving
memory and power efficiency. Experiments on on-device Transformer-based
streaming speech recognition models show that folding attention reduces model
size (and corresponding memory consumption) by up to 24% and power consumption
by up to 23%, all without compromising model accuracy or computation overhead