11 research outputs found
Speeding up Convolutional Neural Networks with Low Rank Expansions
The focus of this paper is speeding up the evaluation of convolutional neural
networks. While delivering impressive results across a range of computer vision
and machine learning tasks, these networks are computationally demanding,
limiting their deployability. Convolutional layers generally consume the bulk
of the processing time, and so in this work we present two simple schemes for
drastically speeding up these layers. This is achieved by exploiting
cross-channel or filter redundancy to construct a low rank basis of filters
that are rank-1 in the spatial domain. Our methods are architecture agnostic,
and can be easily applied to existing CPU and GPU convolutional frameworks for
tuneable speedup performance. We demonstrate this with a real world network
designed for scene text character recognition, showing a possible 2.5x speedup
with no loss in accuracy, and 4.5x speedup with less than 1% drop in accuracy,
still achieving state-of-the-art on standard benchmarks
Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification
In order to encode the class correlation and class specific information in
image representation, we propose a new local feature learning approach named
Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to
hierarchically learn feature transformation filter banks to transform raw pixel
image patches to features. The learned filter banks are expected to: (1) encode
common visual patterns of a flexible number of categories; (2) encode
discriminative information; and (3) hierarchically extract patterns at
different visual levels. Particularly, in each single layer of DDSFL, shareable
filters are jointly learned for classes which share the similar patterns.
Discriminative power of the filters is achieved by enforcing the features from
the same category to be close, while features from different categories to be
far away from each other. Furthermore, we also propose two exemplar selection
methods to iteratively select training data for more efficient and effective
learning. Based on the experimental results, DDSFL can achieve very promising
performance, and it also shows great complementary effect to the
state-of-the-art Caffe features.Comment: Pattern Recognition, Elsevier, 201
FPM: Fine Pose Parts-Based Model with 3D CAD Models
We introduce a novel approach to the problem of localizing objects in an image and estimating their fine-pose. Given exact CAD models, and a few real training images with aligned models, we propose to leverage the geometric information from CAD models and appearance information from real images to learn a model that can accurately estimate fine pose in real images. Specifically, we propose FPM, a fine pose parts-based model, that combines geometric information in the form of shared 3D parts in deformable part based models, and appearance information in the form of objectness to achieve both fast and accurate fine pose estimation. Our method significantly outperforms current state-of-the-art algorithms in both accuracy and speed
The Fastest Deformable Part Model for Object Detection
This paper solves the speed bottleneck of deformable part model (DPM), while maintaining the accuracy in de-tection on challenging datasets. Three prohibitive steps in cascade version of DPM are accelerated, including 2D cor-relation between root filter and feature map, cascade part pruning and HOG feature extraction. For 2D correlation, the root filter is constrained to be low rank, so that 2D cor-relation can be calculated by more efficient linear combi-nation of 1D correlations. A proximal gradient algorithm is adopted to progressively learn the low rank filter in a dis-criminative manner. For cascade part pruning, neighbor-hood aware cascade is proposed to capture the dependence in neighborhood regions for aggressive pruning. Instead of explicit computation of part scores, hypotheses can be pruned by scores of neighborhoods under the first order ap-proximation. For HOG feature extraction, look-up tables are constructed to replace expensive calculations of orien-tation partition and magnitude with simpler matrix index operations. Extensive experiments show that (a) the pro-posed method is 4 times faster than the current fastest DPM method with similar accuracy on Pascal VOC, (b) the pro-posed method achieves state-of-the-art accuracy on pedes-trian and face detection task with frame-rate speed. 1
Learning Everything about Anything: Webly-Supervised Visual Concept Learning
Figure 1: We introduce a fully-automated method that, given any concept, discovers an exhaustive vocabulary explaining all its appearance variations (i.e., actions, interactions, attributes, etc.), and trains full-fledged detection models for it. This figure shows a few of the many variations that our method has learned for four different classes of concepts: object (horse), scene (kitchen), event (Christmas), and action (walking). Recognition is graduating from labs to real-world ap-plications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance varia-tions, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models? In this paper, we introduce a fully-automated approach for learning extensive models for a wide range of variations (e.g. actions, interactions, attributes and beyond) within any concept. Our approach leverages vast resources of on-line books to discover the vocabulary of variance, and in-tertwines the data collection and modeling steps to alleviate the need for explicit human supervision in training the mod-els. Our approach organizes the visual knowledge about a concept in a convenient and useful way, enabling a variety of applications across vision and NLP. Our online system has been queried by users to learn models for several inter-esting concepts including breakfast, Gandhi, beautiful, etc. To date, our system has models available for over 50,000 variations within 150 concepts, and has annotated more than 10 million images with bounding boxes. 1