25 research outputs found
Scalable Object Detection using Deep Neural Networks
Deep convolutional neural networks have recently achieved state-of-the-art
performance on a number of image recognition benchmarks, including the ImageNet
Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on
the localization sub-task was a network that predicts a single bounding box and
a confidence score for each object category in the image. Such a model captures
the whole-image context around the objects but cannot handle multiple instances
of the same object in the image without naively replicating the number of
outputs for each instance. In this work, we propose a saliency-inspired neural
network model for detection, which predicts a set of class-agnostic bounding
boxes along with a single score for each box, corresponding to its likelihood
of containing any object of interest. The model naturally handles a variable
number of instances for each class and allows for cross-class generalization at
the highest levels of the network. We are able to obtain competitive
recognition performance on VOC2007 and ILSVRC2012, while using only the top few
predicted locations in each image and a small number of neural network
evaluations
Speeding up Convolutional Neural Networks with Low Rank Expansions
The focus of this paper is speeding up the evaluation of convolutional neural
networks. While delivering impressive results across a range of computer vision
and machine learning tasks, these networks are computationally demanding,
limiting their deployability. Convolutional layers generally consume the bulk
of the processing time, and so in this work we present two simple schemes for
drastically speeding up these layers. This is achieved by exploiting
cross-channel or filter redundancy to construct a low rank basis of filters
that are rank-1 in the spatial domain. Our methods are architecture agnostic,
and can be easily applied to existing CPU and GPU convolutional frameworks for
tuneable speedup performance. We demonstrate this with a real world network
designed for scene text character recognition, showing a possible 2.5x speedup
with no loss in accuracy, and 4.5x speedup with less than 1% drop in accuracy,
still achieving state-of-the-art on standard benchmarks
Object Detection Through Exploration With A Foveated Visual Field
We present a foveated object detector (FOD) as a biologically-inspired
alternative to the sliding window (SW) approach which is the dominant method of
search in computer vision object detection. Similar to the human visual system,
the FOD has higher resolution at the fovea and lower resolution at the visual
periphery. Consequently, more computational resources are allocated at the
fovea and relatively fewer at the periphery. The FOD processes the entire
scene, uses retino-specific object detection classifiers to guide eye
movements, aligns its fovea with regions of interest in the input image and
integrates observations across multiple fixations. Our approach combines modern
object detectors from computer vision with a recent model of peripheral pooling
regions found at the V1 layer of the human visual system. We assessed various
eye movement strategies on the PASCAL VOC 2007 dataset and show that the FOD
performs on par with the SW detector while bringing significant computational
cost savings.Comment: An extended version of this manuscript was published in PLOS
Computational Biology (October 2017) at
https://doi.org/10.1371/journal.pcbi.100574
Exemplar Based Deep Discriminative and Shareable Feature Learning for Scene Image Classification
In order to encode the class correlation and class specific information in
image representation, we propose a new local feature learning approach named
Deep Discriminative and Shareable Feature Learning (DDSFL). DDSFL aims to
hierarchically learn feature transformation filter banks to transform raw pixel
image patches to features. The learned filter banks are expected to: (1) encode
common visual patterns of a flexible number of categories; (2) encode
discriminative information; and (3) hierarchically extract patterns at
different visual levels. Particularly, in each single layer of DDSFL, shareable
filters are jointly learned for classes which share the similar patterns.
Discriminative power of the filters is achieved by enforcing the features from
the same category to be close, while features from different categories to be
far away from each other. Furthermore, we also propose two exemplar selection
methods to iteratively select training data for more efficient and effective
learning. Based on the experimental results, DDSFL can achieve very promising
performance, and it also shows great complementary effect to the
state-of-the-art Caffe features.Comment: Pattern Recognition, Elsevier, 201
Efficient object detection via structured learning and local classifiers
Object detection has made great strides recently. However, it is still facing two big challenges: detection accuracy and computational efficiency. In this thesis, we present an automatic efficient object detection frarnework to detect object instances ·in images using bounding boxes, which can be trained and tested easily on current personal computers. Our framework is a sliding-window based approach, and consists of two major components: (1) efficient object proposal generation, predicting possible object bounding boxes, and (2) efficient object proposal verification, classifying each bounding box in a multiclass manner.
For object proposal generation, we formulate this problem as a structured learning problem and investigate structural support vector machines (SSVMs) with our proposed scale/aspect-ratio quantization scheme and ranking constraints. A general ranking-order decomposition algorithm is developed for solving the formulation efficiently, and applied to generate proposals using a two-stage cascade. Using image gradients as features, our object proposal generation method achieves
state-of-the-art results in terms Df object recall at a low cost in computation.
For object proposal verification, we propose two locally linear and one locally nonlinear classifiers to approximate the nonlinear decision boundaries in the feature space efficiently. Inspired by the kernel trick, these classifiers map the original features into another feature space explicitly where linear classifiers are employed for classification, and thus have linear computational complexity in
both training and testing, similar to that of linear classifiers. Therefore, in general,
our classifiers can achieve comparable accuracy to kernel based classifiers at
the cost of lower computational time.
To demonstrate its efficiency and generality, our framework is applied to four different object detection tasks: VOC detection challenges, traffic sign detection, pedestrian detection, and face detection. In each task, it can perform reasonably well with acceptable detection accuracy and good computational efficiency. For instance, on VOC datasets with 20 object classes, our method achieved about 0.1 mean average precision (AP) within 2 hours of training and 0.05 second of testing a 500 x 300 pixel image using a mixture of MATLAB and C++ code on a current personal computer
Recognition and localization of relevant human behavior in videos, SPIE,
ABSTRACT Ground surveillance is normally performed by human assets, since it requires visual intelligence. However, especially for military operations, this can be dangerous and is very resource intensive. Therefore, unmanned autonomous visualintelligence systems are desired. In this paper, we present an improved system that can recognize actions of a human and interactions between multiple humans. Central to the new system is our agent-based architecture. The system is trained on thousands of videos and evaluated on realistic persistent surveillance data in the DARPA Mind's Eye program, with hours of videos of challenging scenes. The results show that our system is able to track the people, detect and localize events, and discriminate between different behaviors, and it performs 3.4 times better than our previous system
The Fastest Deformable Part Model for Object Detection
This paper solves the speed bottleneck of deformable part model (DPM), while maintaining the accuracy in de-tection on challenging datasets. Three prohibitive steps in cascade version of DPM are accelerated, including 2D cor-relation between root filter and feature map, cascade part pruning and HOG feature extraction. For 2D correlation, the root filter is constrained to be low rank, so that 2D cor-relation can be calculated by more efficient linear combi-nation of 1D correlations. A proximal gradient algorithm is adopted to progressively learn the low rank filter in a dis-criminative manner. For cascade part pruning, neighbor-hood aware cascade is proposed to capture the dependence in neighborhood regions for aggressive pruning. Instead of explicit computation of part scores, hypotheses can be pruned by scores of neighborhoods under the first order ap-proximation. For HOG feature extraction, look-up tables are constructed to replace expensive calculations of orien-tation partition and magnitude with simpler matrix index operations. Extensive experiments show that (a) the pro-posed method is 4 times faster than the current fastest DPM method with similar accuracy on Pascal VOC, (b) the pro-posed method achieves state-of-the-art accuracy on pedes-trian and face detection task with frame-rate speed. 1