5,824 research outputs found
A Weakly Supervised Approach for Estimating Spatial Density Functions from High-Resolution Satellite Imagery
We propose a neural network component, the regional aggregation layer, that
makes it possible to train a pixel-level density estimator using only
coarse-grained density aggregates, which reflect the number of objects in an
image region. Our approach is simple to use and does not require
domain-specific assumptions about the nature of the density function. We
evaluate our approach on several synthetic datasets. In addition, we use this
approach to learn to estimate high-resolution population and housing density
from satellite imagery. In all cases, we find that our approach results in
better density estimates than a commonly used baseline. We also show how our
housing density estimator can be used to classify buildings as residential or
non-residential.Comment: 10 pages, 8 figures. ACM SIGSPATIAL 2018, Seattle, US
The Lov\'asz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
The Jaccard index, also referred to as the intersection-over-union score, is
commonly employed in the evaluation of image segmentation results given its
perceptual qualities, scale invariance - which lends appropriate relevance to
small objects, and appropriate counting of false negatives, in comparison to
per-pixel losses. We present a method for direct optimization of the mean
intersection-over-union loss in neural networks, in the context of semantic
image segmentation, based on the convex Lov\'asz extension of submodular
losses. The loss is shown to perform better with respect to the Jaccard index
measure than the traditionally used cross-entropy loss. We show quantitative
and qualitative differences between optimizing the Jaccard index per image
versus optimizing the Jaccard index taken over an entire dataset. We evaluate
the impact of our method in a semantic segmentation pipeline and show
substantially improved intersection-over-union segmentation scores on the
Pascal VOC and Cityscapes datasets using state-of-the-art deep learning
segmentation architectures.Comment: Accepted as a conference paper at CVPR 201
Multi-Context Attention for Human Pose Estimation
In this paper, we propose to incorporate convolutional neural networks with a
multi-context attention mechanism into an end-to-end framework for human pose
estimation. We adopt stacked hourglass networks to generate attention maps from
features at multiple resolutions with various semantics. The Conditional Random
Field (CRF) is utilized to model the correlations among neighboring regions in
the attention map. We further combine the holistic attention model, which
focuses on the global consistency of the full human body, and the body part
attention model, which focuses on the detailed description for different body
parts. Hence our model has the ability to focus on different granularity from
local salient regions to global semantic-consistent spaces. Additionally, we
design novel Hourglass Residual Units (HRUs) to increase the receptive field of
the network. These units are extensions of residual units with a side branch
incorporating filters with larger receptive fields, hence features with various
scales are learned and combined within the HRUs. The effectiveness of the
proposed multi-context attention mechanism and the hourglass residual units is
evaluated on two widely used human pose estimation benchmarks. Our approach
outperforms all existing methods on both benchmarks over all the body parts.Comment: The first two authors contribute equally to this wor
- …