2,263 research outputs found
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
Recent work on scene classification still makes use of generic CNN features
in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline
built upon deep CNN features to harvest discriminative visual objects and parts
for scene classification. We first use a region proposal technique to generate
a set of high-quality patches potentially containing objects, and apply a
pre-trained CNN to extract generic deep features from these patches. Then we
perform both unsupervised and weakly supervised learning to screen these
patches and discover discriminative ones representing category-specific objects
and parts. We further apply discriminative clustering enhanced with local CNN
fine-tuning to aggregate similar objects and parts into groups, called meta
objects. A scene image representation is constructed by pooling the feature
response maps of all the learned meta objects at multiple spatial scales. We
have confirmed that the scene image representation obtained using this new
pipeline is capable of delivering state-of-the-art performance on two popular
scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and
Sun397~\cite{Sun397}Comment: To Appear in ICCV 201
Occlusion-Aware Instance Segmentation via BiLayer Network Architectures
Segmenting highly-overlapping image objects is challenging, because there is
typically no distinction between real object contours and occlusion boundaries
on images. Unlike previous instance segmentation methods, we model image
formation as a composition of two overlapping layers, and propose Bilayer
Convolutional Network (BCNet), where the top layer detects occluding objects
(occluders) and the bottom layer infers partially occluded instances
(occludees). The explicit modeling of occlusion relationship with bilayer
structure naturally decouples the boundaries of both the occluding and occluded
instances, and considers the interaction between them during mask regression.
We investigate the efficacy of bilayer structure using two popular
convolutional network designs, namely, Fully Convolutional Network (FCN) and
Graph Convolutional Network (GCN). Further, we formulate bilayer decoupling
using the vision transformer (ViT), by representing instances in the image as
separate learnable occluder and occludee queries. Large and consistent
improvements using one/two-stage and query-based object detectors with various
backbones and network layer choices validate the generalization ability of
bilayer decoupling, as shown by extensive experiments on image instance
segmentation benchmarks (COCO, KINS, COCOA) and video instance segmentation
benchmarks (YTVIS, OVIS, BDD100K MOTS), especially for heavy occlusion cases.
Code and data are available at https://github.com/lkeab/BCNet.Comment: Extended version of "Deep Occlusion-Aware Instance Segmentation with
Overlapping BiLayers", CVPR 2021 (arXiv:2103.12340
Deep Learning based 3D Segmentation: A Survey
3D object segmentation is a fundamental and challenging problem in computer
vision with applications in autonomous driving, robotics, augmented reality and
medical image analysis. It has received significant attention from the computer
vision, graphics and machine learning communities. Traditionally, 3D
segmentation was performed with hand-crafted features and engineered methods
which failed to achieve acceptable accuracy and could not generalize to
large-scale data. Driven by their great success in 2D computer vision, deep
learning techniques have recently become the tool of choice for 3D segmentation
tasks as well. This has led to an influx of a large number of methods in the
literature that have been evaluated on different benchmark datasets. This paper
provides a comprehensive survey of recent progress in deep learning based 3D
segmentation covering over 150 papers. It summarizes the most commonly used
pipelines, discusses their highlights and shortcomings, and analyzes the
competitive results of these segmentation methods. Based on the analysis, it
also provides promising research directions for the future.Comment: Under review of ACM Computing Surveys, 36 pages, 10 tables, 9 figure
- …