3,875 research outputs found
Fast, Exact and Multi-Scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs
In this work we propose a structured prediction technique that combines the
virtues of Gaussian Conditional Random Fields (G-CRF) with Deep Learning: (a)
our structured prediction task has a unique global optimum that is obtained
exactly from the solution of a linear system (b) the gradients of our model
parameters are analytically computed using closed form expressions, in contrast
to the memory-demanding contemporary deep structured prediction approaches that
rely on back-propagation-through-time, (c) our pairwise terms do not have to be
simple hand-crafted expressions, as in the line of works building on the
DenseCRF, but can rather be `discovered' from data through deep architectures,
and (d) out system can trained in an end-to-end manner. Building on standard
tools from numerical analysis we develop very efficient algorithms for
inference and learning, as well as a customized technique adapted to the
semantic segmentation task. This efficiency allows us to explore more
sophisticated architectures for structured prediction in deep learning: we
introduce multi-resolution architectures to couple information across scales in
a joint optimization framework, yielding systematic improvements. We
demonstrate the utility of our approach on the challenging VOC PASCAL 2012
image segmentation benchmark, showing substantial improvements over strong
baselines. We make all of our code and experiments available at
{https://github.com/siddharthachandra/gcrf}Comment: Our code is available at https://github.com/siddharthachandra/gcr
The use of singular functions for the approximate conformal mapping of doubly-connected domains
Let f be the function which maps conformally a given doubly- connected domain onto a circular annulus. We consider the use of two closely related methods for determining approximations to f of the form
fn (z) = z exp, ⎪⎩⎪⎨⎧⎭⎬⎫Σ−(z)uan1jjj
where {uj} is a set of basis functions. The two methods are respectively a variational method, based on an extremum property of the function
H(z) = f′(z)/f(z) - 1/z,
and an orthononnalization method, based on approximating the function H by a finite Fourier series sum.
The main purpose of the paper is to consider the use of the two methods for the mapping of domains having sharp corners, where corner singularities occur. We show, by means of numerical examples, that both methods are capable of producing approximations of high accuracy for the mapping of such "difficult" doubly-connected domains. The essential requirement for this is that the basis set {uj} contains singular functions that reflect the asymptotic behaviour of the function H in the neighbourhood of each "singular" corner
Mass Displacement Networks
Despite the large improvements in performance attained by using deep learning
in computer vision, one can often further improve results with some additional
post-processing that exploits the geometric nature of the underlying task. This
commonly involves displacing the posterior distribution of a CNN in a way that
makes it more appropriate for the task at hand, e.g. better aligned with local
image features, or more compact. In this work we integrate this geometric
post-processing within a deep architecture, introducing a differentiable and
probabilistically sound counterpart to the common geometric voting technique
used for evidence accumulation in vision. We refer to the resulting neural
models as Mass Displacement Networks (MDNs), and apply them to human pose
estimation in two distinct setups: (a) landmark localization, where we collapse
a distribution to a point, allowing for precise localization of body keypoints
and (b) communication across body parts, where we transfer evidence from one
part to the other, allowing for a globally consistent pose estimate. We
evaluate on large-scale pose estimation benchmarks, such as MPII Human Pose and
COCO datasets, and report systematic improvements when compared to strong
baselines.Comment: 12 pages, 4 figure
Pushing the Boundaries of Boundary Detection using Deep Learning
In this work we show that adapting Deep Convolutional Neural Network training
to the task of boundary detection can result in substantial improvements over
the current state-of-the-art in boundary detection.
Our contributions consist firstly in combining a careful design of the loss
for boundary detection training, a multi-resolution architecture and training
with external data to improve the detection accuracy of the current state of
the art. When measured on the standard Berkeley Segmentation Dataset, we
improve theoptimal dataset scale F-measure from 0.780 to 0.808 - while human
performance is at 0.803. We further improve performance to 0.813 by combining
deep learning with grouping, integrating the Normalized Cuts technique within a
deep network.
We also examine the potential of our boundary detector in conjunction with
the task of semantic segmentation and demonstrate clear improvements over
state-of-the-art systems. Our detector is fully integrated in the popular Caffe
framework and processes a 320x420 image in less than a second.Comment: The previous version reported large improvements w.r.t. the LPO
region proposal baseline, which turned out to be due to a wrong computation
for the baseline. The improvements are currently less important, and are
omitted. We are sorry if the reported results caused any confusion. We have
also integrated reviewer feedback regarding human performance on the BSD
benchmar
Attentive Single-Tasking of Multiple Tasks
In this work we address task interference in universal networks by
considering that a network is trained on multiple tasks, but performs one task
at a time, an approach we refer to as "single-tasking multiple tasks". The
network thus modifies its behaviour through task-dependent feature adaptation,
or task attention. This gives the network the ability to accentuate the
features that are adapted to a task, while shunning irrelevant ones. We further
reduce task interference by forcing the task gradients to be statistically
indistinguishable through adversarial training, ensuring that the common
backbone architecture serving all tasks is not dominated by any of the
task-specific gradients. Results in three multi-task dense labelling problems
consistently show: (i) a large reduction in the number of parameters while
preserving, or even improving performance and (ii) a smooth trade-off between
computation and multi-task accuracy. We provide our system's code and
pre-trained models at http://vision.ee.ethz.ch/~kmaninis/astmt/.Comment: CVPR 2019 Camera Read
Structural Attention Neural Networks for improved sentiment analysis
We introduce a tree-structured attention neural network for sentences and
small phrases and apply it to the problem of sentiment classification. Our
model expands the current recursive models by incorporating structural
information around a node of a syntactic tree using both bottom-up and top-down
information propagation. Also, the model utilizes structural attention to
identify the most salient representations during the construction of the
syntactic tree. To our knowledge, the proposed models achieve state of the art
performance on the Stanford Sentiment Treebank dataset.Comment: Submitted to EACL2017 for revie
Deep Learning for Semantic Part Segmentation with High-Level Guidance
In this work we address the task of segmenting an object into its parts, or
semantic part segmentation. We start by adapting a state-of-the-art semantic
segmentation system to this task, and show that a combination of a
fully-convolutional Deep CNN system coupled with Dense CRF labelling provides
excellent results for a broad range of object categories. Still, this approach
remains agnostic to high-level constraints between object parts. We introduce
such prior information by means of the Restricted Boltzmann Machine, adapted to
our task and train our model in an discriminative fashion, as a hidden CRF,
demonstrating that prior information can yield additional improvements. We also
investigate the performance of our approach ``in the wild'', without
information concerning the objects' bounding boxes, using an object detector to
guide a multi-scale segmentation scheme. We evaluate the performance of our
approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing
and face labelling respectively. We show superior performance with respect to
competitive methods that have been extensively engineered on these benchmarks,
as well as realistic qualitative results on part segmentation, even for
occluded or deformable objects. We also provide quantitative and extensive
qualitative results on three classes from the PASCAL Parts dataset. Finally, we
show that our multi-scale segmentation scheme can boost accuracy, recovering
segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table
- …
