1,033 research outputs found
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation
We address the problem of semantic nighttime image segmentation and improve
the state-of-the-art, by adapting daytime models to nighttime without using
nighttime annotations. Moreover, we design a new evaluation framework to
address the substantial uncertainty of semantics in nighttime images. Our
central contributions are: 1) a curriculum framework to gradually adapt
semantic segmentation models from day to night through progressively darker
times of day, exploiting cross-time-of-day correspondences between daytime
images from a reference map and dark images to guide the label inference in the
dark domains; 2) a novel uncertainty-aware annotation and evaluation framework
and metric for semantic segmentation, including image regions beyond human
recognition capability in the evaluation in a principled fashion; 3) the Dark
Zurich dataset, comprising 2416 unlabeled nighttime and 2920 unlabeled twilight
images with correspondences to their daytime counterparts plus a set of 201
nighttime images with fine pixel-level annotations created with our protocol,
which serves as a first benchmark for our novel evaluation. Experiments show
that our map-guided curriculum adaptation significantly outperforms
state-of-the-art methods on nighttime sets both for standard metrics and our
uncertainty-aware metric. Furthermore, our uncertainty-aware evaluation reveals
that selective invalidation of predictions can improve results on data with
ambiguous content such as our benchmark and profit safety-oriented applications
involving invalid inputs.Comment: IEEE T-PAMI 202
Attribute-aware Semantic Segmentation of Road Scenes for Understanding Pedestrian Orientations
Semantic segmentation is an interesting task for many deep learning researchers for scene understanding. However, recognizing details about objects' attributes can be more informative and also helpful for a better scene understanding in intelligent vehicle use cases. This paper introduces a method for simultaneous semantic segmentation and pedestrian attributes recognition. A modified dataset built on top of the Cityscapes dataset is created by adding attribute classes corresponding to pedestrian orientation attributes. The proposed method extends the SegNet model and is trained by using both the original and the attribute-enriched datasets. Based on an experiment, the proposed attribute-aware semantic segmentation approach shows the ability to slightly improve the performance on the Cityscapes dataset, which is capable of expanding its classes in this case through additional data training
Recent advances in deep learning for object detection
Object detection is a fundamental visual recognition problem in computer
vision and has been widely studied in the past decades. Visual object detection
aims to find objects of certain target classes with precise localization in a
given image and assign each object instance a corresponding class label. Due to
the tremendous successes of deep learning based image classification, object
detection techniques using deep learning have been actively studied in recent
years. In this paper, we give a comprehensive survey of recent advances in
visual object detection with deep learning. By reviewing a large body of recent
related work in literature, we systematically analyze the existing object
detection frameworks and organize the survey into three major parts: (i)
detection components, (ii) learning strategies, and (iii) applications &
benchmarks. In the survey, we cover a variety of factors affecting the
detection performance in detail, such as detector architectures, feature
learning, proposal generation, sampling strategies, etc. Finally, we discuss
several future directions to facilitate and spur future research for visual
object detection with deep learning. Keywords: Object Detection, Deep Learning,
Deep Convolutional Neural Network
Vision-Based Semantic Segmentation in Scene Understanding for Autonomous Driving: Recent Achievements, Challenges, and Outlooks
Scene understanding plays a crucial role in autonomous driving by utilizing sensory data for contextual information extraction and decision making. Beyond modeling advances, the enabler for vehicles to become aware of their surroundings is the availability of visual sensory data, which expand the vehicular perception and realizes vehicular contextual awareness in real-world environments. Research directions for scene understanding pursued by related studies include person/vehicle detection and segmentation, their transition analysis, lane change, and turns detection, among many others Unfortunately, these tasks seem insufficient to completely develop fully-autonomous vehicles i.e. achieving level-5 autonomy, travelling just like human-controlled cars. This latter statement is among the conclusions drawn from this review paper: scene understanding for autonomous driving cars using vision sensors still requires significant improvements. With this motivation, this survey defines, analyzes, and reviews the current achievements of the scene understanding research area that mostly rely on computationally complex deep learning models. Furthermore, it covers the generic scene understanding pipeline, investigates the performance reported by the state-of-the-art, informs about the time complexity analysis of avant garde modeling choices, and highlights major triumphs and noted limitations encountered by current research efforts. The survey also includes a comprehensive discussion on the available datasets, and the challenges that, even if lately confronted by researchers, still remain open to date. Finally, our work outlines future research directions to welcome researchers and practitioners to this exciting domain.This work was supported by the European Commission through European Union (EU) and Japan for Artificial Intelligence (AI) under Grant 957339
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection
3D object detection is receiving increasing attention from both industry and
academia thanks to its wide applications in various fields. In this paper, we
propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D
object detection on point clouds. First, we propose a novel 3D detector,
PV-RCNN, which boosts the 3D detection performance by deeply integrating the
feature learning of both point-based set abstraction and voxel-based sparse
convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding
and the keypoint-to-grid RoI feature abstraction. Second, we propose an
advanced framework, PV-RCNN++, for more efficient and accurate 3D object
detection. It consists of two major improvements: sectorized proposal-centric
sampling for efficiently producing more representative keypoints, and
VectorPool aggregation for better aggregating local point features with much
less resource consumption. With these two strategies, our PV-RCNN++ is about
faster than PV-RCNN, while also achieving better performance. The
experiments demonstrate that our proposed PV-RCNN++ framework achieves
state-of-the-art 3D detection performance on the large-scale and
highly-competitive Waymo Open Dataset with 10 FPS inference speed on the
detection range of 150m * 150m.Comment: Accepted by International Journal of Computer Vision (IJCV), code is
available at https://github.com/open-mmlab/OpenPCDe
A Systematic Review of Urban Navigation Systems for Visually Impaired People
Blind and Visually impaired people (BVIP) face a range of practical difficulties when undertaking outdoor journeys as pedestrians. Over the past decade, a variety of assistive devices have been researched and developed to help BVIP navigate more safely and independently. In~addition, research in overlapping domains are addressing the problem of automatic environment interpretation using computer vision and machine learning, particularly deep learning, approaches. Our aim in this article is to present a comprehensive review of research directly in, or relevant to, assistive outdoor navigation for BVIP. We breakdown the navigation area into a series of navigation phases and tasks. We then use this structure for our systematic review of research, analysing articles, methods, datasets and current limitations by task. We also provide an overview of commercial and non-commercial navigation applications targeted at BVIP. Our review contributes to the body of knowledge by providing a comprehensive, structured analysis of work in the domain, including the state of the art, and guidance on future directions. It will support both researchers and other stakeholders in the domain to establish an informed view of research progress
- …