2,024 research outputs found
Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond
State-of-the-art pedestrian detection models have achieved great success in
many benchmarks. However, these models require lots of annotation information
and the labeling process usually takes much time and efforts. In this paper, we
propose a method to generate labeled pedestrian data and adapt them to support
the training of pedestrian detectors. The proposed framework is built on the
Generative Adversarial Network (GAN) with multiple discriminators, trying to
synthesize realistic pedestrians and learn the background context
simultaneously. To handle the pedestrians of different sizes, we adopt the
Spatial Pyramid Pooling (SPP) layer in the discriminator. We conduct
experiments on two benchmarks. The results show that our framework can smoothly
synthesize pedestrians on background images of variations and different levels
of details. To quantitatively evaluate our approach, we add the generated
samples into training data of the baseline pedestrian detectors and show the
synthetic images are able to improve the detectors' performance.Comment: v2.0,adding supplementar
BAOD: Budget-Aware Object Detection
We study the problem of object detection from a novel perspective in which
annotation budget constraints are taken into consideration, appropriately
coined Budget Aware Object Detection (BAOD). When provided with a fixed budget,
we propose a strategy for building a diverse and informative dataset that can
be used to optimally train a robust detector. We investigate both optimization
and learning-based methods to sample which images to annotate and what type of
annotation (strongly or weakly supervised) to annotate them with. We adopt a
hybrid supervised learning framework to train the object detector from both
these types of annotation. We conduct a comprehensive empirical study showing
that a handcrafted optimization method outperforms other selection techniques
including random sampling, uncertainty sampling and active learning. By
combining an optimal image/annotation selection scheme with hybrid supervised
learning to solve the BAOD problem, we show that one can achieve the
performance of a strongly supervised detector on PASCAL-VOC 2007 while saving
12.8% of its original annotation budget. Furthermore, when of the
budget is used, it surpasses this performance by 2.0 mAP percentage points
cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey
The paper gives futuristic challenges disscussed in the cvpaper.challenge. In
2015 and 2016, we thoroughly study 1,600+ papers in several
conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV
Monocular Plan View Networks for Autonomous Driving
Convolutions on monocular dash cam videos capture spatial invariances in the
image plane but do not explicitly reason about distances and depth. We propose
a simple transformation of observations into a bird's eye view, also known as
plan view, for end-to-end control. We detect vehicles and pedestrians in the
first person view and project them into an overhead plan view. This
representation provides an abstraction of the environment from which a deep
network can easily deduce the positions and directions of entities.
Additionally, the plan view enables us to leverage advances in 3D object
detection in conjunction with deep policy learning. We evaluate our monocular
plan view network on the photo-realistic Grand Theft Auto V simulator. A
network using both a plan view and front view causes less than half as many
collisions as previous detection-based methods and an order of magnitude fewer
collisions than pure pixel-based policies.Comment: 8 pages, 9 figure
Weakly Supervised Adversarial Domain Adaptation for Semantic Segmentation in Urban Scenes
Semantic segmentation, a pixel-level vision task, is developed rapidly by
using convolutional neural networks (CNNs). Training CNNs requires a large
amount of labeled data, but manually annotating data is difficult. For
emancipating manpower, in recent years, some synthetic datasets are released.
However, they are still different from real scenes, which causes that training
a model on the synthetic data (source domain) cannot achieve a good performance
on real urban scenes (target domain). In this paper, we propose a weakly
supervised adversarial domain adaptation to improve the segmentation
performance from synthetic data to real scenes, which consists of three deep
neural networks. To be specific, a detection and segmentation ("DS" for short)
model focuses on detecting objects and predicting segmentation map; a
pixel-level domain classifier ("PDC" for short) tries to distinguish the image
features from which domains; an object-level domain classifier ("ODC" for
short) discriminates the objects from which domains and predicts the objects
classes. PDC and ODC are treated as the discriminators, and DS is considered as
the generator. By adversarial learning, DS is supposed to learn
domain-invariant features. In experiments, our proposed method yields the new
record of mIoU metric in the same problem.Comment: To appear at TI
Domain Randomization for Scene-Specific Car Detection and Pose Estimation
We address the issue of domain gap when making use of synthetic data to train
a scene-specific object detector and pose estimator. While previous works have
shown that the constraints of learning a scene-specific model can be leveraged
to create geometrically and photometrically consistent synthetic data, care
must be taken to design synthetic content which is as close as possible to the
real-world data distribution. In this work, we propose to solve domain gap
through the use of appearance randomization to generate a wide range of
synthetic objects to span the space of realistic images for training. An
ablation study of our results is presented to delineate the individual
contribution of different components in the randomization process. We evaluate
our method on VIRAT, UA-DETRAC, EPFL-Car datasets, where we demonstrate that
using scene specific domain randomized synthetic data is better than
fine-tuning off-the-shelf models on limited real data
Neural Person Search Machines
We investigate the problem of person search in the wild in this work. Instead
of comparing the query against all candidate regions generated in a query-blind
manner, we propose to recursively shrink the search area from the whole image
till achieving precise localization of the target person, by fully exploiting
information from the query and contextual cues in every recursive search step.
We develop the Neural Person Search Machines (NPSM) to implement such recursive
localization for person search. Benefiting from its neural search mechanism,
NPSM is able to selectively shrink its focus from a loose region to a tighter
one containing the target automatically. In this process, NPSM employs an
internal primitive memory component to memorize the query representation which
modulates the attention and augments its robustness to other distracting
regions. Evaluations on two benchmark datasets, CUHK-SYSU Person Search dataset
and PRW dataset, have demonstrated that our method can outperform current
state-of-the-arts in both mAP and top-1 evaluation protocols.Comment: ICCV2017 camera read
Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime
This work addresses the problem of semantic image segmentation of nighttime
scenes. Although considerable progress has been made in semantic image
segmentation, it is mainly related to daytime scenarios. This paper proposes a
novel method to progressive adapt the semantic models trained on daytime
scenes, along with large-scale annotations therein, to nighttime scenes via the
bridge of twilight time -- the time between dawn and sunrise, or between sunset
and dusk. The goal of the method is to alleviate the cost of human annotation
for nighttime images by transferring knowledge from standard daytime
conditions. In addition to the method, a new dataset of road scenes is
compiled; it consists of 35,000 images ranging from daytime to twilight time
and to nighttime. Also, a subset of the nighttime images are densely annotated
for method evaluation. Our experiments show that our method is effective for
model adaptation from daytime scenes to nighttime scenes, without using extra
human annotation.Comment: Accepted to International Conference on Intelligent Transportation
Systems (ITSC 2018
Deep Learning for Generic Object Detection: A Survey
Object detection, one of the most fundamental and challenging problems in
computer vision, seeks to locate object instances from a large number of
predefined categories in natural images. Deep learning techniques have emerged
as a powerful strategy for learning feature representations directly from data
and have led to remarkable breakthroughs in the field of generic object
detection. Given this period of rapid evolution, the goal of this paper is to
provide a comprehensive survey of the recent achievements in this field brought
about by deep learning techniques. More than 300 research contributions are
included in this survey, covering many aspects of generic object detection:
detection frameworks, object feature representation, object proposal
generation, context modeling, training strategies, and evaluation metrics. We
finish the survey by identifying promising directions for future research.Comment: IJCV Mino
A Survey on Deep Learning Methods for Robot Vision
Deep learning has allowed a paradigm shift in pattern recognition, from using
hand-crafted features together with statistical classifiers to using
general-purpose learning procedures for learning data-driven representations,
features, and classifiers together. The application of this new paradigm has
been particularly successful in computer vision, in which the development of
deep learning methods for vision applications has become a hot research topic.
Given that deep learning has already attracted the attention of the robot
vision community, the main purpose of this survey is to address the use of deep
learning in robot vision. To achieve this, a comprehensive overview of deep
learning and its usage in computer vision is given, that includes a description
of the most frequently used neural models and their main application areas.
Then, the standard methodology and tools used for designing deep-learning based
vision systems are presented. Afterwards, a review of the principal work using
deep learning in robot vision is presented, as well as current and future
trends related to the use of deep learning in robotics. This survey is intended
to be a guide for the developers of robot vision systems
- …