2,480 research outputs found
Action Recognition in Videos: from Motion Capture Labs to the Web
This paper presents a survey of human action recognition approaches based on
visual data recorded from a single video camera. We propose an organizing
framework which puts in evidence the evolution of the area, with techniques
moving from heavily constrained motion capture scenarios towards more
challenging, realistic, "in the wild" videos. The proposed organization is
based on the representation used as input for the recognition task, emphasizing
the hypothesis assumed and thus, the constraints imposed on the type of video
that each technique is able to address. Expliciting the hypothesis and
constraints makes the framework particularly useful to select a method, given
an application. Another advantage of the proposed organization is that it
allows categorizing newest approaches seamlessly with traditional ones, while
providing an insightful perspective of the evolution of the action recognition
task up to now. That perspective is the basis for the discussion in the end of
the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4
table
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
A Comprehensive Literature Review on Convolutional Neural Networks
The fields of computer vision and image processing from their initial days have been dealing with the problems of visual recognition. Convolutional Neural Networks (CNNs) in machine learning are deep architectures built as feed-forward neural networks or perceptrons, which are inspired by the research done in the fields of visual analysis by the visual cortex of mammals like cats. This work gives a detailed analysis of CNNs for the computer vision tasks, natural language processing, fundamental sciences and engineering problems along with other miscellaneous tasks. The general CNN structure along with its mathematical intuition and working, a brief critical commentary on the advantages and disadvantages, which leads researchers to search for alternatives to CNNâs are also mentioned. The paper also serves as an appreciation of the brain-child of past researchers for the existence of such a fecund architecture for handling multidimensional data and approaches to improve their performance further
Geometric Cross-Modal Comparison of Heterogeneous Sensor Data
In this work, we address the problem of cross-modal comparison of aerial data
streams. A variety of simulated automobile trajectories are sensed using two
different modalities: full-motion video, and radio-frequency (RF) signals
received by detectors at various locations. The information represented by the
two modalities is compared using self-similarity matrices (SSMs) corresponding
to time-ordered point clouds in feature spaces of each of these data sources;
we note that these feature spaces can be of entirely different scale and
dimensionality. Several metrics for comparing SSMs are explored, including a
cutting-edge time-warping technique that can simultaneously handle local time
warping and partial matches, while also controlling for the change in geometry
between feature spaces of the two modalities. We note that this technique is
quite general, and does not depend on the choice of modalities. In this
particular setting, we demonstrate that the cross-modal distance between SSMs
corresponding to the same trajectory type is smaller than the cross-modal
distance between SSMs corresponding to distinct trajectory types, and we
formalize this observation via precision-recall metrics in experiments.
Finally, we comment on promising implications of these ideas for future
integration into multiple-hypothesis tracking systems.Comment: 10 pages, 13 figures, Proceedings of IEEE Aeroconf 201
- âŠ