64,394 research outputs found

    Universal-RCNN: Universal Object Detector via Transferable Graph R-CNN

    Full text link
    The dominant object detection approaches treat each dataset separately and fit towards a specific domain, which cannot adapt to other domains without extensive retraining. In this paper, we address the problem of designing a universal object detection model that exploits diverse category granularity from multiple domains and predict all kinds of categories in one system. Existing works treat this problem by integrating multiple detection branches upon one shared backbone network. However, this paradigm overlooks the crucial semantic correlations between multiple domains, such as categories hierarchy, visual similarity, and linguistic relationship. To address these drawbacks, we present a novel universal object detector called Universal-RCNN that incorporates graph transfer learning for propagating relevant semantic information across multiple datasets to reach semantic coherency. Specifically, we first generate a global semantic pool by integrating all high-level semantic representation of all the categories. Then an Intra-Domain Reasoning Module learns and propagates the sparse graph representation within one dataset guided by a spatial-aware GCN. Finally, an InterDomain Transfer Module is proposed to exploit diverse transfer dependencies across all domains and enhance the regional feature representation by attending and transferring semantic contexts globally. Extensive experiments demonstrate that the proposed method significantly outperforms multiple-branch models and achieves the state-of-the-art results on multiple object detection benchmarks (mAP: 49.1% on COCO).Comment: Accepted by AAAI2

    Algorithmic issues in visual object recognition

    Get PDF
    This thesis is divided into two parts covering two aspects of research in the area of visual object recognition. Part I is about human detection in still images. Human detection is a challenging computer vision task due to the wide variability in human visual appearances and body poses. In this part, we present several enhancements to human detection algorithms. First, we present an extension to the integral images framework to allow for constant time computation of non-uniformly weighted summations over rectangular regions using a bundle of integral images. Such computational element is commonly used in constructing gradient-based feature descriptors, which are the most successful in shape-based human detection. Second, we introduce deformable features as an alternative to the conventional static features used in classifiers based on boosted ensembles. Deformable features can enhance the accuracy of human detection by adapting to pose changes that can be described as translations of body features. Third, we present a comprehensive evaluation framework for cascade-based human detectors. The presented framework facilitates comparison between cascade-based detection algorithms, provides a confidence measure for result, and deploys a practical evaluation scenario. Part II explores the possibilities of enhancing the speed of core algorithms used in visual object recognition using the computing capabilities of Graphics Processing Units (GPUs). First, we present an implementation of Graph Cut on GPUs, which achieves up to 4x speedup against compared to a CPU implementation. The Graph Cut algorithm has many applications related to visual object recognition such as segmentation and 3D point matching. Second, we present an efficient sparse approximation of kernel matrices for GPUs that can significantly speed up kernel based learning algorithms, which are widely used in object detection and recognition. We present an implementation of the Affinity Propagation clustering algorithm based on this representation, which is about 6 times faster than another GPU implementation based on a conventional sparse matrix representation

    Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy

    Get PDF
    Autonomous vehicles are becoming central for the future of mobility, supported by advances in deep learning techniques. The performance of aself-driving system is highly dependent on the quality of the perception task. Developments in sensor technologies have led to an increased availability of 3D scanners such as LiDAR, allowing for a more accurate representation of the vehicle's surroundings, leading to safer systems. The rapid development and consequent rise of research studies around self-driving systems since early 2010, resulted in a tremendous increase in the number and novelty of object detection methods. After the first wave of works that essentially tried to expand known techniques from object detection in images, more recently there has been a notable development in newer and more adapted to LiDAR data works. This paper addresses the existing literature on object detection using LiDAR data within the scope of self-driving and brings a systematic way for analysing it. Unlike general object detection surveys, we will focus on point-cloud data, which presents specific challenges, notably its high-dimensional and sparse nature. This work introduces a common object detection pipeline and taxonomy to facilitate a thorough comparison between different techniques and, departing from it, this work will critically examine the representation of data (critical for complexity reduction), feature extraction and finally the object detection models. A comparison between performance results of the different models is included, alongside with some future research challenges.This work is supported by European Structural and Investment Funds in the FEDER component, through the Operational Competitiveness and Internationalization Programme (COMPETE 2020) [Project n. 037902; Funding Reference: POCI-01-0247-FEDER-037902]

    Object movement identification via sparse representation

    Get PDF
    Object Movement Identification from videos is very challenging, and has got numerous applications in sports evaluation, video surveillance, elder/child care, etc. In thisresearch, a model using sparse representation is presented for the human activity detection from the video data. This is done using a linear combination of atoms from a dictionary and a sparse coefficient matrix. The dictionary is created using a Spatio Temporal Interest Points (STIP) algorithm. The Spatio temporal features are extracted for the training video data as well as the testing video data. The K-Singular Value Decomposition (KSVD)algorithm is used for learning dictionaries for the trainingvideo dataset. Finally, human action is classified using aminimum threshold residual value of the corresponding actionclass in the testing video dataset. Experiments are conducted onthe KTH dataset which contains a number of actions. Thecurrent approach performed well in classifying activities with asuccess rate of 90%
    corecore