26,790 research outputs found
Deep Learning Algorithms with Applications to Video Analytics for A Smart City: A Survey
Deep learning has recently achieved very promising results in a wide range of
areas such as computer vision, speech recognition and natural language
processing. It aims to learn hierarchical representations of data by using deep
architecture models. In a smart city, a lot of data (e.g. videos captured from
many distributed sensors) need to be automatically processed and analyzed. In
this paper, we review the deep learning algorithms applied to video analytics
of smart city in terms of different research topics: object detection, object
tracking, face recognition, image classification and scene labeling.Comment: 8 pages, 18 figure
Subsurface structure analysis using computational interpretation and learning: A visual signal processing perspective
Understanding Earth's subsurface structures has been and continues to be an
essential component of various applications such as environmental monitoring,
carbon sequestration, and oil and gas exploration. By viewing the seismic
volumes that are generated through the processing of recorded seismic traces,
researchers were able to learn from applying advanced image processing and
computer vision algorithms to effectively analyze and understand Earth's
subsurface structures. In this paper, first, we summarize the recent advances
in this direction that relied heavily on the fields of image processing and
computer vision. Second, we discuss the challenges in seismic interpretation
and provide insights and some directions to address such challenges using
emerging machine learning algorithms
A Fast Two Pass Multi-Value Segmentation Algorithm based on Connected Component Analysis
Connected component analysis (CCA) has been heavily used to label binary
images and classify segments. However, it has not been well-exploited to
segment multi-valued natural images. This work proposes a novel multi-value
segmentation algorithm that utilizes CCA to segment color images. A user
defined distance measure is incorporated in the proposed modified CCA to
identify and segment similar image regions. The raw output of the algorithm
consists of distinctly labelled segmented regions. The proposed algorithm has a
unique design architecture that provides several benefits: 1) it can be used to
segment any multi-channel multi-valued image; 2) the distance
measure/segmentation criteria can be application-specific and 3) an absolute
linear-time implementation allows easy extension for real-time video
segmentation. Experimental demonstrations of the aforesaid benefits are
presented along with the comparison results on multiple datasets with current
benchmark algorithms. A number of possible application areas are also
identified and results on real-time video segmentation has been presented to
show the promise of the proposed method.Comment: 9 pages, 7 figure
Towards Holistic Scene Understanding: Feedback Enabled Cascaded Classification Models
Scene understanding includes many related sub-tasks, such as scene
categorization, depth estimation, object detection, etc. Each of these
sub-tasks is often notoriously hard, and state-of-the-art classifiers already
exist for many of them. These classifiers operate on the same raw image and
provide correlated outputs. It is desirable to have an algorithm that can
capture such correlation without requiring any changes to the inner workings of
any classifier.
We propose Feedback Enabled Cascaded Classification Models (FE-CCM), that
jointly optimizes all the sub-tasks, while requiring only a `black-box'
interface to the original classifier for each sub-task. We use a two-layer
cascade of classifiers, which are repeated instantiations of the original ones,
with the output of the first layer fed into the second layer as input. Our
training method involves a feedback step that allows later classifiers to
provide earlier classifiers information about which error modes to focus on. We
show that our method significantly improves performance in all the sub-tasks in
the domain of scene understanding, where we consider depth estimation, scene
categorization, event categorization, object detection, geometric labeling and
saliency detection. Our method also improves performance in two robotic
applications: an object-grasping robot and an object-finding robot.Comment: 14 pages, 11 figure
Block Stability for MAP Inference
To understand the empirical success of approximate MAP inference, recent work
(Lang et al., 2018) has shown that some popular approximation algorithms
perform very well when the input instance is stable. The simplest stability
condition assumes that the MAP solution does not change at all when some of the
pairwise potentials are (adversarially) perturbed. Unfortunately, this strong
condition does not seem to be satisfied in practice. In this paper, we
introduce a significantly more relaxed condition that only requires blocks
(portions) of an input instance to be stable. Under this block stability
condition, we prove that the pairwise LP relaxation is persistent on the stable
blocks. We complement our theoretical results with an empirical evaluation of
real-world MAP inference instances from computer vision. We design an algorithm
to find stable blocks, and find that these real instances have large stable
regions. Our work gives a theoretical explanation for the widespread empirical
phenomenon of persistency for this LP relaxation
GAL: A Global-Attributes Assisted Labeling System for Outdoor Scenes
An approach that extracts global attributes from outdoor images to facilitate
geometric layout labeling is investigated in this work. The proposed
Global-attributes Assisted Labeling (GAL) system exploits both local features
and global attributes. First, by following a classical method, we use local
features to provide initial labels for all super-pixels. Then, we develop a set
of techniques to extract global attributes from 2D outdoor images. They include
sky lines, ground lines, vanishing lines, etc. Finally, we propose the GAL
system that integrates global attributes in the conditional random field (CRF)
framework to improve initial labels so as to offer a more robust labeling
result. The performance of the proposed GAL system is demonstrated and
benchmarked with several state-of-the-art algorithms against a popular outdoor
scene layout dataset
Tight Error Bounds for Structured Prediction
Structured prediction tasks in machine learning involve the simultaneous
prediction of multiple labels. This is typically done by maximizing a score
function on the space of labels, which decomposes as a sum of pairwise
elements, each depending on two specific labels. Intuitively, the more pairwise
terms are used, the better the expected accuracy. However, there is currently
no theoretical account of this intuition. This paper takes a significant step
in this direction.
We formulate the problem as classifying the vertices of a known graph
, where the vertices and edges of the graph are labelled and correlate
semi-randomly with the ground truth. We show that the prospects for achieving
low expected Hamming error depend on the structure of the graph in
interesting ways. For example, if is a very poor expander, like a path,
then large expected Hamming error is inevitable. Our main positive result shows
that, for a wide class of graphs including 2D grid graphs common in machine
vision applications, there is a polynomial-time algorithm with small and
information-theoretically near-optimal expected error. Our results provide a
first step toward a theoretical justification for the empirical success of the
efficient approximate inference algorithms that are used for structured
prediction in models where exact inference is intractable
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection
Recurrent neural networks (RNNs) have shown the ability to improve scene
parsing through capturing long-range dependencies among image units. In this
paper, we propose dense RNNs for scene labeling by exploring various long-range
semantic dependencies among image units. Different from existing RNN based
approaches, our dense RNNs are able to capture richer contextual dependencies
for each image unit by enabling immediate connections between each pair of
image units, which significantly enhances their discriminative power. Besides,
to select relevant dependencies and meanwhile to restrain irrelevant ones for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model allows automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), we develop an end-to-end scene
labeling system. Extensive experiments on three large-scale benchmarks
demonstrate that the proposed approach can improve the baselines by large
margins and outperform other state-of-the-art algorithms.Comment: 10 pages. arXiv admin note: substantial text overlap with
arXiv:1801.0683
Inserting an Edge into a Geometric Embedding
The algorithm of Gutwenger et al. to insert an edge in linear time into a
planar graph with a minimal number of crossings on , is a helpful tool
for designing heuristics that minimize edge crossings in drawings of general
graphs. Unfortunately, some graphs do not have a geometric embedding
such that has the same number of crossings as the embedding .
This motivates the study of the computational complexity of the following
problem: Given a combinatorially embedded graph , compute a geometric
embedding that has the same combinatorial embedding as and that
minimizes the crossings of . We give polynomial-time algorithms for
special cases and prove that the general problem is fixed-parameter tractable
in the number of crossings. Moreover, we show how to approximate the number of
crossings by a factor , where is the maximum vertex degree
of .Comment: Appears in the Proceedings of the 26th International Symposium on
Graph Drawing and Network Visualization (GD 2018
- …