1,083 research outputs found

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    SINet: A Scale-insensitive Convolutional Neural Network for Fast Vehicle Detection

    Full text link
    Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects, 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.Comment: Accepted by IEEE Transactions on Intelligent Transportation Systems (T-ITS

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    On the insufficiency of laterality-based accounts of face perception and corresponding visual field asymmetries

    Get PDF
    It has been known for nearly a century that the left half of a face is better recognized than the right half (Wolff, 1933). This left half-face advantage is commonly thought to reflect a combination of right hemisphere (RH) superiority for face recognition and a contralateral hemifield-hemisphere correspondence between the RH and the left visual field (LVF). The purpose of this set of experiments was to determine whether RH superiority for faces and contralateral hemifield-hemisphere correspondence is sufficient to explain the LVF half-face advantage. We set out four aims to accomplish this: (1) Use behavioral and fMRI methods to demonstrate the LVF half-face advantage and identify its neural basis in ventral occipital-temporal cortex (VOTC); (2) use behavioral methods to show that RH superiority is insufficient to explain the LVF half-face advantage; (3) use behavioral methods to show that we perceive only one half of a face at a time; and (4), albeit not initially proposed, use methods developed to accomplish aims 1-3 to distinguish retinotopic face representation from face-centered representation.In our first set of experiments (behavioral and fMRI), we identified for the first time a neural LVF half-face bias in RH face-selective cortex. We also found that the neural LVF bias in right FFA underlies the relationship between FFA laterality and the LVF half-face advantage. This revealed an explicit neural mechanism to describe the commonly assumed basis of the LVF advantage for centrally-viewed faces. In our next set of experiments (behavioral) we addressed the second aim, and found that LVF half-face advantage is contingent upon the simultaneous presence of both an upright LVF and RVF half-face, and does not reflect inherently superior processing of LVF over RVF half-face information. This challenged the sufficiency of the mechanism we discovered in Aim 1 as an explanation of the LVF half-face advantage. In our next set of behavioral experiments (which addressed our third aim) we found that half-face identities compete for limited processing resources, and only one identity can be processed at a time. Furthermore, we found that this does not apply to faces in which half-face identities are similar enough to be perceived as a normal (i.e. non-chimeric) face. In our final set of experiments (behavioral) we addressed our additional Aim 4, and found that the LVF half-face advantage occurs regardless of the location of the face in the visual field. This suggests that faces are represented to some degree in an object-centered reference frame, and the LVF half-face bias reflects a bias to the left half of a face, rather than a retinotopic bias to the left half of visual space

    Object detection via a multi-region & semantic segmentation-aware CNN model

    Get PDF
    We propose an object detection system that relies on a multi-region deep convolutional neural network (CNN) that also encodes semantic segmentation-aware features. The resulting CNN-based representation aims at capturing a diverse set of discriminative appearance factors and exhibits localization sensitivity that is essential for accurate object localization. We exploit the above properties of our recognition module by integrating it on an iterative localization mechanism that alternates between scoring a box proposal and refining its location with a deep CNN regression model. Thanks to the efficient use of our modules, we detect objects with very high localization accuracy. On the detection challenges of PASCAL VOC2007 and PASCAL VOC2012 we achieve mAP of 78.2% and 73.9% correspondingly, surpassing any other published work by a significant margin.Comment: Extended technical report -- short version to appear at ICCV 201
    • …
    corecore