10,418 research outputs found
Review on Computer Vision Techniques in Emergency Situation
In emergency situations, actions that save lives and limit the impact of
hazards are crucial. In order to act, situational awareness is needed to decide
what to do. Geolocalized photos and video of the situations as they evolve can
be crucial in better understanding them and making decisions faster. Cameras
are almost everywhere these days, either in terms of smartphones, installed
CCTV cameras, UAVs or others. However, this poses challenges in big data and
information overflow. Moreover, most of the time there are no disasters at any
given location, so humans aiming to detect sudden situations may not be as
alert as needed at any point in time. Consequently, computer vision tools can
be an excellent decision support. The number of emergencies where computer
vision tools has been considered or used is very wide, and there is a great
overlap across related emergency research. Researchers tend to focus on
state-of-the-art systems that cover the same emergency as they are studying,
obviating important research in other fields. In order to unveil this overlap,
the survey is divided along four main axes: the types of emergencies that have
been studied in computer vision, the objective that the algorithms can address,
the type of hardware needed and the algorithms used. Therefore, this review
provides a broad overview of the progress of computer vision covering all sorts
of emergencies.Comment: 25 page
Affordances Provide a Fundamental Categorization Principle for Visual Scenes
How do we know that a kitchen is a kitchen by looking? Relatively little is
known about how we conceptualize and categorize different visual environments.
Traditional models of visual perception posit that scene categorization is
achieved through the recognition of a scene's objects, yet these models cannot
account for the mounting evidence that human observers are relatively
insensitive to the local details in an image. Psychologists have long theorized
that the affordances, or actionable possibilities of a stimulus are pivotal to
its perception. To what extent are scene categories created from similar
affordances? Using a large-scale experiment using hundreds of scene categories,
we show that the activities afforded by a visual scene provide a fundamental
categorization principle. Affordance-based similarity explained the majority of
the structure in the human scene categorization patterns, outperforming
alternative similarities based on objects or visual features. We all models
were combined, affordances provided the majority of the predictive power in the
combined model, and nearly half of the total explained variance is captured
only by affordances. These results challenge many existing models of high-level
visual perception, and provide immediately testable hypotheses for the
functional organization of the human perceptual system
Spatially Constrained Location Prior for Scene Parsing
Semantic context is an important and useful cue for scene parsing in
complicated natural images with a substantial amount of variations in objects
and the environment. This paper proposes Spatially Constrained Location Prior
(SCLP) for effective modelling of global and local semantic context in the
scene in terms of inter-class spatial relationships. Unlike existing studies
focusing on either relative or absolute location prior of objects, the SCLP
effectively incorporates both relative and absolute location priors by
calculating object co-occurrence frequencies in spatially constrained image
blocks. The SCLP is general and can be used in conjunction with various visual
feature-based prediction models, such as Artificial Neural Networks and Support
Vector Machine (SVM), to enforce spatial contextual constraints on class
labels. Using SVM classifiers and a linear regression model, we demonstrate
that the incorporation of SCLP achieves superior performance compared to the
state-of-the-art methods on the Stanford background and SIFT Flow datasets.Comment: authors' pre-print version of a article published in IJCNN 201
Context-Aware Query Selection for Active Learning in Event Recognition
Activity recognition is a challenging problem with many practical
applications. In addition to the visual features, recent approaches have
benefited from the use of context, e.g., inter-relationships among the
activities and objects. However, these approaches require data to be labeled,
entirely available beforehand, and not designed to be updated continuously,
which make them unsuitable for surveillance applications. In contrast, we
propose a continuous-learning framework for context-aware activity recognition
from unlabeled video, which has two distinct advantages over existing methods.
First, it employs a novel active-learning technique that not only exploits the
informativeness of the individual activities but also utilizes their contextual
information during query selection; this leads to significant reduction in
expensive manual annotation effort. Second, the learned models can be adapted
online as more data is available. We formulate a conditional random field model
that encodes the context and devise an information-theoretic approach that
utilizes entropy and mutual information of the nodes to compute the set of most
informative queries, which are labeled by a human. These labels are combined
with graphical inference techniques for incremental updates. We provide a
theoretical formulation of the active learning framework with an analytic
solution. Experiments on six challenging datasets demonstrate that our
framework achieves superior performance with significantly less manual
labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine
Intelligence (T-PAMI
SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences
Semantic scene understanding is important for various applications. In
particular, self-driving cars need a fine-grained understanding of the surfaces
and objects in their vicinity. Light detection and ranging (LiDAR) provides
precise geometric information about the environment and is thus a part of the
sensor suites of almost all self-driving cars. Despite the relevance of
semantic scene understanding for this application, there is a lack of a large
dataset for this task which is based on an automotive LiDAR.
In this paper, we introduce a large dataset to propel research on laser-based
semantic segmentation. We annotated all sequences of the KITTI Vision Odometry
Benchmark and provide dense point-wise annotations for the complete
field-of-view of the employed automotive LiDAR. We propose three benchmark
tasks based on this dataset: (i) semantic segmentation of point clouds using a
single scan, (ii) semantic segmentation using multiple past scans, and (iii)
semantic scene completion, which requires to anticipate the semantic scene in
the future. We provide baseline experiments and show that there is a need for
more sophisticated models to efficiently tackle these tasks. Our dataset opens
the door for the development of more advanced methods, but also provides
plentiful data to investigate new research directions.Comment: ICCV2019. See teaser video at http://bit.ly/SemanticKITTI-tease
Deep Learning Markov Random Field for Semantic Segmentation
Semantic segmentation tasks can be well modeled by Markov Random Field (MRF).
This paper addresses semantic segmentation by incorporating high-order
relations and mixture of label contexts into MRF. Unlike previous works that
optimized MRFs using iterative algorithm, we solve MRF by proposing a
Convolutional Neural Network (CNN), namely Deep Parsing Network (DPN), which
enables deterministic end-to-end computation in a single forward pass.
Specifically, DPN extends a contemporary CNN to model unary terms and
additional layers are devised to approximate the mean field (MF) algorithm for
pairwise terms. It has several appealing properties. First, different from the
recent works that required many iterations of MF during back-propagation, DPN
is able to achieve high performance by approximating one iteration of MF.
Second, DPN represents various types of pairwise terms, making many existing
models as its special cases. Furthermore, pairwise terms in DPN provide a
unified framework to encode rich contextual information in high-dimensional
data, such as images and videos. Third, DPN makes MF easier to be parallelized
and speeded up, thus enabling efficient inference. DPN is thoroughly evaluated
on standard semantic image/video segmentation benchmarks, where a single DPN
model yields state-of-the-art segmentation accuracies on PASCAL VOC 2012,
Cityscapes dataset and CamVid dataset.Comment: To appear in IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), 2017. Extended version of our previous ICCV 2015 paper
(arXiv:1509.02634
Joint Attention in Driver-Pedestrian Interaction: from Theory to Practice
Today, one of the major challenges that autonomous vehicles are facing is the
ability to drive in urban environments. Such a task requires communication
between autonomous vehicles and other road users in order to resolve various
traffic ambiguities. The interaction between road users is a form of
negotiation in which the parties involved have to share their attention
regarding a common objective or a goal (e.g. crossing an intersection), and
coordinate their actions in order to accomplish it. In this literature review
we aim to address the interaction problem between pedestrians and drivers (or
vehicles) from joint attention point of view. More specifically, we will
discuss the theoretical background behind joint attention, its application to
traffic interaction and practical approaches to implementing joint attention
for autonomous vehicles
ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data
Scene understanding of high resolution aerial images is of great importance
for the task of automated monitoring in various remote sensing applications.
Due to the large within-class and small between-class variance in pixel values
of objects of interest, this remains a challenging task. In recent years, deep
convolutional neural networks have started being used in remote sensing
applications and demonstrate state of the art performance for pixel level
classification of objects. \textcolor{black}{Here we propose a reliable
framework for performant results for the task of semantic segmentation of
monotemporal very high resolution aerial images. Our framework consists of a
novel deep learning architecture, ResUNet-a, and a novel loss function based on
the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination
with residual connections, atrous convolutions, pyramid scene parsing pooling
and multi-tasking inference. ResUNet-a infers sequentially the boundary of the
objects, the distance transform of the segmentation mask, the segmentation mask
and a colored reconstruction of the input. Each of the tasks is conditioned on
the inference of the previous ones, thus establishing a conditioned
relationship between the various tasks, as this is described through the
architecture's computation graph. We analyse the performance of several
flavours of the Generalized Dice loss for semantic segmentation, and we
introduce a novel variant loss function for semantic segmentation of objects
that has excellent convergence properties and behaves well even under the
presence of highly imbalanced classes.} The performance of our modeling
framework is evaluated on the ISPRS 2D Potsdam dataset. Results show
state-of-the-art performance with an average F1 score of 92.9\% over all
classes for our best model.Comment: Accepted for publication to the ISPRS Journal of Photogrammetry and
Remote Sensin
Towards Automated Cadastral Boundary Delineation from UAV Data
Unmanned aerial vehicles (UAV) are evolving as an alternative tool to acquire
land tenure data. UAVs can capture geospatial data at high quality and
resolution in a cost-effective, transparent and flexible manner, from which
visible land parcel boundaries, i.e., cadastral boundaries are delineable. This
delineation is to no extent automated, even though physical objects
automatically retrievable through image analysis methods mark a large portion
of cadastral boundaries. This study proposes (i) a workflow that automatically
extracts candidate cadastral boundaries from UAV orthoimages and (ii) a tool
for their semi-automatic processing to delineate final cadastral boundaries.
The workflow consists of two state-of-the-art computer vision methods, namely
gPb contour detection and SLIC superpixels that are transferred to remote
sensing in this study. The tool combines the two methods, allows a
semi-automatic final delineation and is implemented as a publicly available
QGIS plugin. The approach does not yet aim to provide a comparable alternative
to manual cadastral mapping procedures. However, the methodological development
of the tool towards this goal is developed in this paper. A study with 13
volunteers investigates the design and implementation of the approach and
gathers initial qualitative as well as quantitate results. The study revealed
points for improvement, which are prioritized based on the study results and
which will be addressed in future work.Comment: Report on current state (August 2017) of PhD work of first author.
Further info: https://its4land.com/automate-it-wp5
Learning to Detect Vehicles by Clustering Appearance Patterns
This paper studies efficient means for dealing with intra-category diversity
in object detection. Strategies for occlusion and orientation handling are
explored by learning an ensemble of detection models from visual and
geometrical clusters of object instances. An AdaBoost detection scheme is
employed with pixel lookup features for fast detection. The analysis provides
insight into the design of a robust vehicle detection system, showing promise
in terms of detection performance and orientation estimation accuracy.Comment: Preprint version of our T-ITS 2015 pape
- …