12,395 research outputs found
Pedestrian Detection with Wearable Cameras for the Blind: A Two-way Perspective
Blind people have limited access to information about their surroundings,
which is important for ensuring one's safety, managing social interactions, and
identifying approaching pedestrians. With advances in computer vision, wearable
cameras can provide equitable access to such information. However, the
always-on nature of these assistive technologies poses privacy concerns for
parties that may get recorded. We explore this tension from both perspectives,
those of sighted passersby and blind users, taking into account camera
visibility, in-person versus remote experience, and extracted visual
information. We conduct two studies: an online survey with MTurkers (N=206) and
an in-person experience study between pairs of blind (N=10) and sighted (N=40)
participants, where blind participants wear a working prototype for pedestrian
detection and pass by sighted participants. Our results suggest that both of
the perspectives of users and bystanders and the several factors mentioned
above need to be carefully considered to mitigate potential social tensions.Comment: The 2020 ACM CHI Conference on Human Factors in Computing Systems
(CHI 2020
Exploring Human Vision Driven Features for Pedestrian Detection
Motivated by the center-surround mechanism in the human visual attention
system, we propose to use average contrast maps for the challenge of pedestrian
detection in street scenes due to the observation that pedestrians indeed
exhibit discriminative contrast texture. Our main contributions are first to
design a local, statistical multi-channel descriptorin order to incorporate
both color and gradient information. Second, we introduce a multi-direction and
multi-scale contrast scheme based on grid-cells in order to integrate
expressive local variations. Contributing to the issue of selecting most
discriminative features for assessing and classification, we perform extensive
comparisons w.r.t. statistical descriptors, contrast measurements, and scale
structures. This way, we obtain reasonable results under various
configurations. Empirical findings from applying our optimized detector on the
INRIA and Caltech pedestrian datasets show that our features yield
state-of-the-art performance in pedestrian detection.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Deep Learning for Semantic Part Segmentation with High-Level Guidance
In this work we address the task of segmenting an object into its parts, or
semantic part segmentation. We start by adapting a state-of-the-art semantic
segmentation system to this task, and show that a combination of a
fully-convolutional Deep CNN system coupled with Dense CRF labelling provides
excellent results for a broad range of object categories. Still, this approach
remains agnostic to high-level constraints between object parts. We introduce
such prior information by means of the Restricted Boltzmann Machine, adapted to
our task and train our model in an discriminative fashion, as a hidden CRF,
demonstrating that prior information can yield additional improvements. We also
investigate the performance of our approach ``in the wild'', without
information concerning the objects' bounding boxes, using an object detector to
guide a multi-scale segmentation scheme. We evaluate the performance of our
approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing
and face labelling respectively. We show superior performance with respect to
competitive methods that have been extensively engineered on these benchmarks,
as well as realistic qualitative results on part segmentation, even for
occluded or deformable objects. We also provide quantitative and extensive
qualitative results on three classes from the PASCAL Parts dataset. Finally, we
show that our multi-scale segmentation scheme can boost accuracy, recovering
segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table
Adversarially Tuned Scene Generation
Generalization performance of trained computer vision systems that use
computer graphics (CG) generated data is not yet effective due to the concept
of 'domain-shift' between virtual and real data. Although simulated data
augmented with a few real world samples has been shown to mitigate domain shift
and improve transferability of trained models, guiding or bootstrapping the
virtual data generation with the distributions learnt from target real world
domain is desired, especially in the fields where annotating even few real
images is laborious (such as semantic labeling, and intrinsic images etc.). In
order to address this problem in an unsupervised manner, our work combines
recent advances in CG (which aims to generate stochastic scene layouts coupled
with large collections of 3D object models) and generative adversarial training
(which aims train generative models by measuring discrepancy between generated
and real data in terms of their separability in the space of a deep
discriminatively-trained classifier). Our method uses iterative estimation of
the posterior density of prior distributions for a generative graphical model.
This is done within a rejection sampling framework. Initially, we assume
uniform distributions as priors on the parameters of a scene described by a
generative graphical model. As iterations proceed the prior distributions get
updated to distributions that are closer to the (unknown) distributions of
target data. We demonstrate the utility of adversarially tuned scene generation
on two real-world benchmark datasets (CityScapes and CamVid) for traffic scene
semantic labeling with a deep convolutional net (DeepLab). We realized
performance improvements by 2.28 and 3.14 points (using the IoU metric) between
the DeepLab models trained on simulated sets prepared from the scene generation
models before and after tuning to CityScapes and CamVid respectively.Comment: 9 pages, accepted at CVPR 201
- …