14,306 research outputs found
Maximizing CNN Accelerator Efficiency Through Resource Partitioning
Convolutional neural networks (CNNs) are revolutionizing machine learning,
but they present significant computational challenges. Recently, many
FPGA-based accelerators have been proposed to improve the performance and
efficiency of CNNs. Current approaches construct a single processor that
computes the CNN layers one at a time; the processor is optimized to maximize
the throughput at which the collection of layers is computed. However, this
approach leads to inefficient designs because the same processor structure is
used to compute CNN layers of radically varying dimensions.
We present a new CNN accelerator paradigm and an accompanying automated
design methodology that partitions the available FPGA resources into multiple
processors, each of which is tailored for a different subset of the CNN
convolutional layers. Using the same FPGA resources as a single large
processor, multiple smaller specialized processors increase computational
efficiency and lead to a higher overall throughput. Our design methodology
achieves 3.8x higher throughput than the state-of-the-art approach on
evaluating the popular AlexNet CNN on a Xilinx Virtex-7 FPGA. For the more
recent SqueezeNet and GoogLeNet, the speedups are 2.2x and 2.0x
Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval
This paper presents a new state-of-the-art for document image classification
and retrieval, using features learned by deep convolutional neural networks
(CNNs). In object and scene analysis, deep neural nets are capable of learning
a hierarchical chain of abstraction from pixel inputs to concise and
descriptive representations. The current work explores this capacity in the
realm of document analysis, and confirms that this representation strategy is
superior to a variety of popular hand-crafted alternatives. Experiments also
show that (i) features extracted from CNNs are robust to compression, (ii) CNNs
trained on non-document images transfer well to document analysis tasks, and
(iii) enforcing region-specific feature-learning is unnecessary given
sufficient training data. This work also makes available a new labelled subset
of the IIT-CDIP collection, containing 400,000 document images across 16
categories, useful for training new CNNs for document analysis
- …