36,580 research outputs found
Beyond Pixels: A Comprehensive Survey from Bottom-up to Semantic Image Segmentation and Cosegmentation
Image segmentation refers to the process to divide an image into
nonoverlapping meaningful regions according to human perception, which has
become a classic topic since the early ages of computer vision. A lot of
research has been conducted and has resulted in many applications. However,
while many segmentation algorithms exist, yet there are only a few sparse and
outdated summarizations available, an overview of the recent achievements and
issues is lacking. We aim to provide a comprehensive review of the recent
progress in this field. Covering 180 publications, we give an overview of broad
areas of segmentation topics including not only the classic bottom-up
approaches, but also the recent development in superpixel, interactive methods,
object proposals, semantic image parsing and image cosegmentation. In addition,
we also review the existing influential datasets and evaluation metrics.
Finally, we suggest some design flavors and research directions for future
research in image segmentation.Comment: submitted to Elsevier Journal of Visual Communications and Image
Representatio
On Modular Training of Neural Acoustics-to-Word Model for LVCSR
End-to-end (E2E) automatic speech recognition (ASR) systems directly map
acoustics to words using a unified model. Previous works mostly focus on E2E
training a single model which integrates acoustic and language model into a
whole. Although E2E training benefits from sequence modeling and simplified
decoding pipelines, large amount of transcribed acoustic data is usually
required, and traditional acoustic and language modelling techniques cannot be
utilized. In this paper, a novel modular training framework of E2E ASR is
proposed to separately train neural acoustic and language models during
training stage, while still performing end-to-end inference in decoding stage.
Here, an acoustics-to-phoneme model (A2P) and a phoneme-to-word model (P2W) are
trained using acoustic data and text data respectively. A phone synchronous
decoding (PSD) module is inserted between A2P and P2W to reduce sequence
lengths without precision loss. Finally, modules are integrated into an
acousticsto-word model (A2W) and jointly optimized using acoustic data to
retain the advantage of sequence modeling. Experiments on a 300- hour
Switchboard task show significant improvement over the direct A2W model. The
efficiency in both training and decoding also benefits from the proposed
method.Comment: accepted by ICASSP201
Co-Sparse Textural Similarity for Image Segmentation
We propose an algorithm for segmenting natural images based on texture and
color information, which leverages the co-sparse analysis model for image
segmentation within a convex multilabel optimization framework. As a key
ingredient of this method, we introduce a novel textural similarity measure,
which builds upon the co-sparse representation of image patches. We propose a
Bayesian approach to merge textural similarity with information about color and
location. Combined with recently developed convex multilabel optimization
methods this leads to an efficient algorithm for both supervised and
unsupervised segmentation, which is easily parallelized on graphics hardware.
The approach provides competitive results in unsupervised segmentation and
outperforms state-of-the-art interactive segmentation methods on the Graz
Benchmark
Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings
There exist two main approaches to automatically extract affective
orientation: lexicon-based and corpus-based. In this work, we argue that these
two methods are compatible and show that combining them can improve the
accuracy of emotion classifiers. In particular, we introduce a novel variant of
the Label Propagation algorithm that is tailored to distributed word
representations, we apply batch gradient descent to accelerate the optimization
of label propagation and to make the optimization feasible for large graphs,
and we propose a reproducible method for emotion lexicon expansion. We conclude
that label propagation can expand an emotion lexicon in a meaningful way and
that the expanded emotion lexicon can be leveraged to improve the accuracy of
an emotion classifier
Context-Aware Query Selection for Active Learning in Event Recognition
Activity recognition is a challenging problem with many practical
applications. In addition to the visual features, recent approaches have
benefited from the use of context, e.g., inter-relationships among the
activities and objects. However, these approaches require data to be labeled,
entirely available beforehand, and not designed to be updated continuously,
which make them unsuitable for surveillance applications. In contrast, we
propose a continuous-learning framework for context-aware activity recognition
from unlabeled video, which has two distinct advantages over existing methods.
First, it employs a novel active-learning technique that not only exploits the
informativeness of the individual activities but also utilizes their contextual
information during query selection; this leads to significant reduction in
expensive manual annotation effort. Second, the learned models can be adapted
online as more data is available. We formulate a conditional random field model
that encodes the context and devise an information-theoretic approach that
utilizes entropy and mutual information of the nodes to compute the set of most
informative queries, which are labeled by a human. These labels are combined
with graphical inference techniques for incremental updates. We provide a
theoretical formulation of the active learning framework with an analytic
solution. Experiments on six challenging datasets demonstrate that our
framework achieves superior performance with significantly less manual
labeling.Comment: To appear in Transactions of Pattern Pattern Analysis and Machine
Intelligence (T-PAMI
Modeling Precursors for Event Forecasting via Nested Multi-Instance Learning
Forecasting events like civil unrest movements, disease outbreaks, financial
market movements and government elections from open source indicators such as
news feeds and social media streams is an important and challenging problem.
From the perspective of human analysts and policy makers, forecasting
algorithms need to provide supporting evidence and identify the causes related
to the event of interest. We develop a novel multiple instance learning based
approach that jointly tackles the problem of identifying evidence-based
precursors and forecasts events into the future. Specifically, given a
collection of streaming news articles from multiple sources we develop a nested
multiple instance learning approach to forecast significant societal events
across three countries in Latin America. Our algorithm is able to identify news
articles considered as precursors for a protest. Our empirical evaluation shows
the strengths of our proposed approaches in filtering candidate precursors,
forecasting the occurrence of events with a lead time and predicting the
characteristics of different events in comparison to several other
formulations. We demonstrate through case studies the effectiveness of our
proposed model in filtering the candidate precursors for inspection by a human
analyst.Comment: The conference version of the paper is submitted for publicatio
Spatially Constrained Location Prior for Scene Parsing
Semantic context is an important and useful cue for scene parsing in
complicated natural images with a substantial amount of variations in objects
and the environment. This paper proposes Spatially Constrained Location Prior
(SCLP) for effective modelling of global and local semantic context in the
scene in terms of inter-class spatial relationships. Unlike existing studies
focusing on either relative or absolute location prior of objects, the SCLP
effectively incorporates both relative and absolute location priors by
calculating object co-occurrence frequencies in spatially constrained image
blocks. The SCLP is general and can be used in conjunction with various visual
feature-based prediction models, such as Artificial Neural Networks and Support
Vector Machine (SVM), to enforce spatial contextual constraints on class
labels. Using SVM classifiers and a linear regression model, we demonstrate
that the incorporation of SCLP achieves superior performance compared to the
state-of-the-art methods on the Stanford background and SIFT Flow datasets.Comment: authors' pre-print version of a article published in IJCNN 201
Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Access to electronic health record (EHR) data has motivated computational
advances in medical research. However, various concerns, particularly over
privacy, can limit access to and collaborative use of EHR data. Sharing
synthetic EHR data could mitigate risk. In this paper, we propose a new
approach, medical Generative Adversarial Network (medGAN), to generate
realistic synthetic patient records. Based on input real patient records,
medGAN can generate high-dimensional discrete variables (e.g., binary and count
features) via a combination of an autoencoder and generative adversarial
networks. We also propose minibatch averaging to efficiently avoid mode
collapse, and increase the learning efficiency with batch normalization and
shortcut connections. To demonstrate feasibility, we showed that medGAN
generates synthetic patient records that achieve comparable performance to real
data on many experiments including distribution statistics, predictive modeling
tasks and a medical expert review. We also empirically observe a limited
privacy risk in both identity and attribute disclosure using medGAN.Comment: Accepted at Machine Learning in Health Care (MLHC) 201
Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
We approach structured output prediction by optimizing a deep value network
(DVN) to precisely estimate the task loss on different output configurations
for a given input. Once the model is trained, we perform inference by gradient
descent on the continuous relaxations of the output variables to find outputs
with promising scores from the value network. When applied to image
segmentation, the value network takes an image and a segmentation mask as
inputs and predicts a scalar estimating the intersection over union between the
input and ground truth masks. For multi-label classification, the DVN's
objective is to correctly predict the F1 score for any potential label
configuration. The DVN framework achieves the state-of-the-art results on
multi-label prediction and image segmentation benchmarks.Comment: Published at ICML 201
Prediction of Solar Flare Size and Time-to-Flare Using Support Vector Machine Regression
We study the prediction of solar flare size and time-to-flare using 38
features describing magnetic complexity of the photospheric magnetic field.
This work uses support vector regression to formulate a mapping from the
38-dimensional feature space to a continuous-valued label vector representing
flare size or time-to-flare. When we consider flaring regions only, we find an
average error in estimating flare size of approximately half a
\emph{geostationary operational environmental satellite} (\emph{GOES}) class.
When we additionally consider non-flaring regions, we find an increased average
error of approximately 3/4 a \emph{GOES} class. We also consider thresholding
the regressed flare size for the experiment containing both flaring and
non-flaring regions and find a true positive rate of 0.69 and a true negative
rate of 0.86 for flare prediction. The results for both of these size
regression experiments are consistent across a wide range of predictive time
windows, indicating that the magnetic complexity features may be persistent in
appearance long before flare activity. This is supported by our larger error
rates of some 40 hr in the time-to-flare regression problem. The 38 magnetic
complexity features considered here appear to have discriminative potential for
flare size, but their persistence in time makes them less discriminative for
the time-to-flare problem.Comment: http://iopscience.iop.org/article/10.1088/0004-637X/812/1/51/met
- …