3,942 research outputs found
Monocular SLAM Supported Object Recognition
In this work, we develop a monocular SLAM-aware object recognition system
that is able to achieve considerably stronger recognition performance, as
compared to classical object recognition systems that function on a
frame-by-frame basis. By incorporating several key ideas including multi-view
object proposals and efficient feature encoding methods, our proposed system is
able to detect and robustly recognize objects in its environment using a single
RGB camera in near-constant time. Through experiments, we illustrate the
utility of using such a system to effectively detect and recognize objects,
incorporating multiple object viewpoint detections into a unified prediction
hypothesis. The performance of the proposed recognition system is evaluated on
the UW RGB-D Dataset, showing strong recognition performance and scalable
run-time performance compared to current state-of-the-art recognition systems.Comment: Accepted to appear at Robotics: Science and Systems 2015, Rome, Ital
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
State-of-the-art object detection networks depend on region proposal
algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN
have reduced the running time of these detection networks, exposing region
proposal computation as a bottleneck. In this work, we introduce a Region
Proposal Network (RPN) that shares full-image convolutional features with the
detection network, thus enabling nearly cost-free region proposals. An RPN is a
fully convolutional network that simultaneously predicts object bounds and
objectness scores at each position. The RPN is trained end-to-end to generate
high-quality region proposals, which are used by Fast R-CNN for detection. We
further merge RPN and Fast R-CNN into a single network by sharing their
convolutional features---using the recently popular terminology of neural
networks with 'attention' mechanisms, the RPN component tells the unified
network where to look. For the very deep VGG-16 model, our detection system has
a frame rate of 5fps (including all steps) on a GPU, while achieving
state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS
COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015
competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning
entries in several tracks. Code has been made publicly available.Comment: Extended tech repor
Unsupervised Network Pretraining via Encoding Human Design
Over the years, computer vision researchers have spent an immense amount of
effort on designing image features for the visual object recognition task. We
propose to incorporate this valuable experience to guide the task of training
deep neural networks. Our idea is to pretrain the network through the task of
replicating the process of hand-designed feature extraction. By learning to
replicate the process, the neural network integrates previous research
knowledge and learns to model visual objects in a way similar to the
hand-designed features. In the succeeding finetuning step, it further learns
object-specific representations from labeled data and this boosts its
classification power. We pretrain two convolutional neural networks where one
replicates the process of histogram of oriented gradients feature extraction,
and the other replicates the process of region covariance feature extraction.
After finetuning, we achieve substantially better performance than the baseline
methods.Comment: 9 pages, 11 figures, WACV 2016: IEEE Conference on Applications of
Computer Visio
DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams
Similarity matching and join of time series data streams has gained a lot of
relevance in today's world that has large streaming data. This process finds
wide scale application in the areas of location tracking, sensor networks,
object positioning and monitoring to name a few. However, as the size of the
data stream increases, the cost involved to retain all the data in order to aid
the process of similarity matching also increases. We develop a novel framework
to addresses the following objectives. Firstly, Dimension reduction is
performed in the preprocessing stage, where large stream data is segmented and
reduced into a compact representation such that it retains all the crucial
information by a technique called Multi-level Segment Means (MSM). This reduces
the space complexity associated with the storage of large time-series data
streams. Secondly, it incorporates effective Similarity Matching technique to
analyze if the new data objects are symmetric to the existing data stream. And
finally, the Pruning Technique that filters out the pseudo data object pairs
and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction
Factor. We have performed exhaustive experimental trials to show that the
proposed framework is both efficient and competent in comparison with earlier
works.Comment: 20 pages,8 figures, 6 Table
Learning to track for spatio-temporal action localization
We propose an effective approach for spatio-temporal action localization in
realistic videos. The approach first detects proposals at the frame-level and
scores them with a combination of static and motion CNN features. It then
tracks high-scoring proposals throughout the video using a
tracking-by-detection approach. Our tracker relies simultaneously on
instance-level and class-level detectors. The tracks are scored using a
spatio-temporal motion histogram, a descriptor at the track level, in
combination with the CNN features. Finally, we perform temporal localization of
the action using a sliding-window approach at the track level. We present
experimental results for spatio-temporal localization on the UCF-Sports, J-HMDB
and UCF-101 action localization datasets, where our approach outperforms the
state of the art with a margin of 15%, 7% and 12% respectively in mAP
- …