5,178 research outputs found
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
Training recurrent neural networks robust to incomplete data: application to Alzheimer's disease progression modeling
Disease progression modeling (DPM) using longitudinal data is a challenging
machine learning task. Existing DPM algorithms neglect temporal dependencies
among measurements, make parametric assumptions about biomarker trajectories,
do not model multiple biomarkers jointly, and need an alignment of subjects'
trajectories. In this paper, recurrent neural networks (RNNs) are utilized to
address these issues. However, in many cases, longitudinal cohorts contain
incomplete data, which hinders the application of standard RNNs and requires a
pre-processing step such as imputation of the missing values. Instead, we
propose a generalized training rule for the most widely used RNN architecture,
long short-term memory (LSTM) networks, that can handle both missing predictor
and target values. The proposed LSTM algorithm is applied to model the
progression of Alzheimer's disease (AD) using six volumetric magnetic resonance
imaging (MRI) biomarkers, i.e., volumes of ventricles, hippocampus, whole
brain, fusiform, middle temporal gyrus, and entorhinal cortex, and it is
compared to standard LSTM networks with data imputation and a parametric,
regression-based DPM method. The results show that the proposed algorithm
achieves a significantly lower mean absolute error (MAE) than the alternatives
with p < 0.05 using Wilcoxon signed rank test in predicting values of almost
all of the MRI biomarkers. Moreover, a linear discriminant analysis (LDA)
classifier applied to the predicted biomarker values produces a significantly
larger AUC of 0.90 vs. at most 0.84 with p < 0.001 using McNemar's test for
clinical diagnosis of AD. Inspection of MAE curves as a function of the amount
of missing data reveals that the proposed LSTM algorithm achieves the best
performance up until more than 74% missing values. Finally, it is illustrated
how the method can successfully be applied to data with varying time intervals.Comment: arXiv admin note: substantial text overlap with arXiv:1808.0550
Real-Time Human Motion Capture with Multiple Depth Cameras
Commonly used human motion capture systems require intrusive attachment of
markers that are visually tracked with multiple cameras. In this work we
present an efficient and inexpensive solution to markerless motion capture
using only a few Kinect sensors. Unlike the previous work on 3d pose estimation
using a single depth camera, we relax constraints on the camera location and do
not assume a co-operative user. We apply recent image segmentation techniques
to depth images and use curriculum learning to train our system on purely
synthetic data. Our method accurately localizes body parts without requiring an
explicit shape model. The body joint locations are then recovered by combining
evidence from multiple views in real-time. We also introduce a dataset of ~6
million synthetic depth frames for pose estimation from multiple cameras and
exceed state-of-the-art results on the Berkeley MHAD dataset.Comment: Accepted to computer robot vision 201
Gibbs Max-margin Topic Models with Data Augmentation
Max-margin learning is a powerful approach to building classifiers and
structured output predictors. Recent work on max-margin supervised topic models
has successfully integrated it with Bayesian topic models to discover
discriminative latent semantic structures and make accurate predictions for
unseen testing data. However, the resulting learning problems are usually hard
to solve because of the non-smoothness of the margin loss. Existing approaches
to building max-margin supervised topic models rely on an iterative procedure
to solve multiple latent SVM subproblems with additional mean-field assumptions
on the desired posterior distributions. This paper presents an alternative
approach by defining a new max-margin loss. Namely, we present Gibbs max-margin
supervised topic models, a latent variable Gibbs classifier to discover hidden
topic representations for various tasks, including classification, regression
and multi-task learning. Gibbs max-margin supervised topic models minimize an
expected margin loss, which is an upper bound of the existing margin loss
derived from an expected prediction rule. By introducing augmented variables
and integrating out the Dirichlet variables analytically by conjugacy, we
develop simple Gibbs sampling algorithms with no restricting assumptions and no
need to solve SVM subproblems. Furthermore, each step of the
"augment-and-collapse" Gibbs sampling algorithms has an analytical conditional
distribution, from which samples can be easily drawn. Experimental results
demonstrate significant improvements on time efficiency. The classification
performance is also significantly improved over competitors on binary,
multi-class and multi-label classification tasks.Comment: 35 page
Mining Entity Synonyms with Efficient Neural Set Generation
Mining entity synonym sets (i.e., sets of terms referring to the same entity)
is an important task for many entity-leveraging applications. Previous work
either rank terms based on their similarity to a given query term, or treats
the problem as a two-phase task (i.e., detecting synonymy pairs, followed by
organizing these pairs into synonym sets). However, these approaches fail to
model the holistic semantics of a set and suffer from the error propagation
issue. Here we propose a new framework, named SynSetMine, that efficiently
generates entity synonym sets from a given vocabulary, using example sets from
external knowledge bases as distant supervision. SynSetMine consists of two
novel modules: (1) a set-instance classifier that jointly learns how to
represent a permutation invariant synonym set and whether to include a new
instance (i.e., a term) into the set, and (2) a set generation algorithm that
enumerates the vocabulary only once and applies the learned set-instance
classifier to detect all entity synonym sets in it. Experiments on three real
datasets from different domains demonstrate both effectiveness and efficiency
of SynSetMine for mining entity synonym sets.Comment: AAAI 2019 camera-ready versio
- …