64,394 research outputs found
Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN
The dominant object detection approaches treat each dataset separately and
fit towards a specific domain, which cannot adapt to other domains without
extensive retraining. In this paper, we address the problem of designing a
universal object detection model that exploits diverse category granularity
from multiple domains and predict all kinds of categories in one system.
Existing works treat this problem by integrating multiple detection branches
upon one shared backbone network. However, this paradigm overlooks the crucial
semantic correlations between multiple domains, such as categories hierarchy,
visual similarity, and linguistic relationship. To address these drawbacks, we
present a novel universal object detector called Universal-RCNN that
incorporates graph transfer learning for propagating relevant semantic
information across multiple datasets to reach semantic coherency. Specifically,
we first generate a global semantic pool by integrating all high-level semantic
representation of all the categories. Then an Intra-Domain Reasoning Module
learns and propagates the sparse graph representation within one dataset guided
by a spatial-aware GCN. Finally, an InterDomain Transfer Module is proposed to
exploit diverse transfer dependencies across all domains and enhance the
regional feature representation by attending and transferring semantic contexts
globally. Extensive experiments demonstrate that the proposed method
significantly outperforms multiple-branch models and achieves the
state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on
COCO).Comment: Accepted by AAAI2
Algorithmic issues in visual object recognition
This thesis is divided into two parts covering two aspects of
research in the area of visual object recognition.
Part I is about human detection in still images. Human
detection is a challenging computer vision task due to the wide
variability in human visual appearances and body poses. In this
part, we present several enhancements to human detection
algorithms. First, we present an extension to the integral
images framework to allow for constant time computation of
non-uniformly weighted summations over rectangular regions
using a bundle of integral images. Such computational element
is commonly used in constructing gradient-based feature
descriptors, which are the most successful in shape-based human
detection. Second, we introduce deformable features as an
alternative to the conventional static features used in
classifiers based on boosted ensembles. Deformable features can
enhance the accuracy of human detection by adapting to pose
changes that can be described as translations of body features.
Third, we present a comprehensive evaluation framework for
cascade-based human detectors. The presented framework
facilitates comparison between cascade-based detection
algorithms, provides a confidence measure for result, and
deploys a practical evaluation scenario.
Part II explores the possibilities of enhancing the speed of
core algorithms used in visual object recognition using the
computing capabilities of Graphics Processing Units (GPUs).
First, we present an implementation of Graph Cut on GPUs, which
achieves up to 4x speedup against compared to a CPU
implementation. The Graph Cut algorithm has many applications
related to visual object recognition such as segmentation and
3D point matching. Second, we present an efficient sparse
approximation of kernel matrices for GPUs that can
significantly speed up kernel based learning algorithms, which
are widely used in object detection and recognition. We present
an implementation of the Affinity Propagation clustering
algorithm based on this representation, which is about 6 times
faster than another GPU implementation based on a conventional
sparse matrix representation
Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy
Autonomous vehicles are becoming central for the future of mobility, supported by advances in deep learning techniques. The performance of aself-driving system is highly dependent on the quality of the perception task. Developments in sensor technologies have led to an increased availability of 3D scanners such as LiDAR, allowing for a more accurate representation of the vehicle's surroundings, leading to safer systems. The rapid development and consequent rise of research studies around self-driving systems since early 2010, resulted in a tremendous increase in the number and novelty of object detection methods. After the first wave of works that essentially tried to expand known techniques from object detection in images, more recently there has been a notable development in newer and more adapted to LiDAR data works. This paper addresses the existing literature on object detection using LiDAR data within the scope of self-driving and brings a systematic way for analysing it. Unlike general object detection surveys, we will focus on point-cloud data, which presents specific challenges, notably its high-dimensional and sparse nature. This work introduces a common object detection pipeline and taxonomy to facilitate a thorough comparison between different techniques and, departing from it, this work will critically examine the representation of data (critical for complexity reduction), feature extraction and finally the object detection models. A comparison between performance results of the different models is included, alongside with some future research challenges.This work is supported by European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n. 037902; Funding Reference: POCI-01-0247-FEDER-037902]
Object movement identification via sparse representation
Object Movement Identification from videos is very challenging, and has got
numerous applications in sports evaluation, video surveillance, elder/child care, etc. In
thisresearch, a model using sparse representation is presented for the human activity detection
from the video data. This is done using a linear combination of atoms from a dictionary and a
sparse coefficient matrix. The dictionary is created using a Spatio Temporal Interest Points
(STIP) algorithm. The Spatio temporal features are extracted for the training video data as well
as the testing video data. The K-Singular Value Decomposition (KSVD)algorithm is used for
learning dictionaries for the trainingvideo dataset. Finally, human action is classified using
aminimum threshold residual value of the corresponding actionclass in the testing video dataset.
Experiments are conducted onthe KTH dataset which contains a number of actions. Thecurrent
approach performed well in classifying activities with asuccess rate of 90%
- …