22,239 research outputs found
Distributed Training of Structured SVM
Training structured prediction models is time-consuming. However, most
existing approaches only use a single machine, thus, the advantage of computing
power and the capacity for larger data sets of multiple machines have not been
exploited. In this work, we propose an efficient algorithm for distributedly
training structured support vector machines based on a distributed
block-coordinate descent method. Both theoretical and experimental results
indicate that our method is efficient.Comment: NIPS Workshop on Optimization for Machine Learning, 201
Bayesian Optimization of Text Representations
When applying machine learning to problems in NLP, there are many choices to
make about how to represent input texts. These choices can have a big effect on
performance, but they are often uninteresting to researchers or practitioners
who simply need a module that performs well. We propose an approach to
optimizing over this space of choices, formulating the problem as global
optimization. We apply a sequential model-based optimization technique and show
that our method makes standard linear models competitive with more
sophisticated, expensive state-of-the-art methods based on latent variable
models or neural networks on various topic classification and sentiment
analysis problems. Our approach is a first step towards black-box NLP systems
that work with raw text and do not require manual tuning
Distributed Block-diagonal Approximation Methods for Regularized Empirical Risk Minimization
In recent years, there is a growing need to train machine learning models on
a huge volume of data. Designing efficient distributed optimization algorithms
for empirical risk minimization (ERM) has therefore become an active and
challenging research topic. In this paper, we propose a flexible framework for
distributed ERM training through solving the dual problem, which provides a
unified description and comparison of existing methods. Our approach requires
only approximate solutions of the sub-problems involved in the optimization
process, and is versatile to be applied on many large-scale machine learning
problems including classification, regression, and structured prediction. We
show that our approach enjoys global linear convergence for a broader class of
problems, and achieves faster empirical performance, compared with existing
works
A New Smooth Approximation to the Zero One Loss with a Probabilistic Interpretation
We examine a new form of smooth approximation to the zero one loss in which
learning is performed using a reformulation of the widely used logistic
function. Our approach is based on using the posterior mean of a novel
generalized Beta-Bernoulli formulation. This leads to a generalized logistic
function that approximates the zero one loss, but retains a probabilistic
formulation conferring a number of useful properties. The approach is easily
generalized to kernel logistic regression and easily integrated into methods
for structured prediction. We present experiments in which we learn such models
using an optimization method consisting of a combination of gradient descent
and coordinate descent using localized grid search so as to escape from local
minima. Our experiments indicate that optimization quality is improved when
learning meta-parameters are themselves optimized using a validation set. Our
experiments show improved performance relative to widely used logistic and
hinge loss methods on a wide variety of problems ranging from standard UC
Irvine and libSVM evaluation datasets to product review predictions and a
visual information extraction task. We observe that the approach: 1) is more
robust to outliers compared to the logistic and hinge losses; 2) outperforms
comparable logistic and max margin models on larger scale benchmark problems;
3) when combined with Gaussian- Laplacian mixture prior on parameters the
kernelized version of our formulation yields sparser solutions than Support
Vector Machine classifiers; and 4) when integrated into a probabilistic
structured prediction technique our approach provides more accurate
probabilities yielding improved inference and increasing information extraction
performance.Comment: 32 pages, 7 figures, 15 table
An end-to-end generative framework for video segmentation and recognition
We describe an end-to-end generative approach for the segmentation and
recognition of human activities. In this approach, a visual representation
based on reduced Fisher Vectors is combined with a structured temporal model
for recognition. We show that the statistical properties of Fisher Vectors make
them an especially suitable front-end for generative models such as Gaussian
mixtures. The system is evaluated for both the recognition of complex
activities as well as their parsing into action units. Using a variety of video
datasets ranging from human cooking activities to animal behaviors, our
experiments demonstrate that the resulting architecture outperforms
state-of-the-art approaches for larger datasets, i.e. when sufficient amount of
data is available for training structured generative models.Comment: Proc. of IEEE Winter Conference on Applications of Computer Vision
(WACV), 201
Segmentation Rectification for Video Cutout via One-Class Structured Learning
Recent works on interactive video object cutout mainly focus on designing
dynamic foreground-background (FB) classifiers for segmentation propagation.
However, the research on optimally removing errors from the FB classification
is sparse, and the errors often accumulate rapidly, causing significant errors
in the propagated frames. In this work, we take the initial steps to addressing
this problem, and we call this new task \emph{segmentation rectification}. Our
key observation is that the possibly asymmetrically distributed false positive
and false negative errors were handled equally in the conventional methods. We,
alternatively, propose to optimally remove these two types of errors. To this
effect, we propose a novel bilayer Markov Random Field (MRF) model for this new
task. We also adopt the well-established structured learning framework to learn
the optimal model from data. Additionally, we propose a novel one-class
structured SVM (OSSVM) which greatly speeds up the structured learning process.
Our method naturally extends to RGB-D videos as well. Comprehensive experiments
on both RGB and RGB-D data demonstrate that our simple and effective method
significantly outperforms the segmentation propagation methods adopted in the
state-of-the-art video cutout systems, and the results also suggest the
potential usefulness of our method in image cutout system
Method of Tibetan Person Knowledge Extraction
Person knowledge extraction is the foundation of the Tibetan knowledge graph
construction, which provides support for Tibetan question answering system,
information retrieval, information extraction and other researches, and
promotes national unity and social stability. This paper proposes a SVM and
template based approach to Tibetan person knowledge extraction. Through
constructing the training corpus, we build the templates based the shallow
parsing analysis of Tibetan syntactic, semantic features and verbs. Using the
training corpus, we design a hierarchical SVM classifier to realize the entity
knowledge extraction. Finally, experimental results prove the method has
greater improvement in Tibetan person knowledge extraction.Comment: 6 page
Whole-brain Prediction Analysis with GraphNet
Multivariate machine learning methods are increasingly used to analyze
neuroimaging data, often replacing more traditional "mass univariate"
techniques that fit data one voxel at a time. In the functional magnetic
resonance imaging (fMRI) literature, this has led to broad application of
"off-the-shelf" classification and regression methods. These generic approaches
allow investigators to use ready-made algorithms to accurately decode
perceptual, cognitive, or behavioral states from distributed patterns of neural
activity. However, when applied to correlated whole-brain fMRI data these
methods suffer from coefficient instability, are sensitive to outliers, and
yield dense solutions that are hard to interpret without arbitrary
thresholding. Here, we develop variants of the the Graph-constrained Elastic
Net (GraphNet), ..., we (1) extend GraphNet to include robust loss functions
that confer insensitivity to outliers, (2) equip them with "adaptive" penalties
that asymptotically guarantee correct variable selection, and (3) develop a
novel sparse structured Support Vector GraphNet classifier (SVGN). When applied
to previously published data, these efficient whole-brain methods significantly
improved classification accuracy over previously reported VOI-based analyses on
the same data while discovering task-related regions not documented in the
original VOI approach. Critically, GraphNet estimates generalize well to
out-of-sample data collected more than three years later on the same task but
with different subjects and stimuli. By enabling robust and efficient selection
of important voxels from whole-brain data taken over multiple time points
(>100,000 "features"), these methods enable data-driven selection of brain
areas that accurately predict single-trial behavior within and across
individuals
Spatial-Aware Dictionary Learning for Hyperspectral Image Classification
This paper presents a structured dictionary-based model for hyperspectral
data that incorporates both spectral and contextual characteristics of a
spectral sample, with the goal of hyperspectral image classification. The idea
is to partition the pixels of a hyperspectral image into a number of spatial
neighborhoods called contextual groups and to model each pixel with a linear
combination of a few dictionary elements learned from the data. Since pixels
inside a contextual group are often made up of the same materials, their linear
combinations are constrained to use common elements from the dictionary. To
this end, dictionary learning is carried out with a joint sparse regularizer to
induce a common sparsity pattern in the sparse coefficients of each contextual
group. The sparse coefficients are then used for classification using a linear
SVM. Experimental results on a number of real hyperspectral images confirm the
effectiveness of the proposed representation for hyperspectral image
classification. Moreover, experiments with simulated multispectral data show
that the proposed model is capable of finding representations that may
effectively be used for classification of multispectral-resolution samples.Comment: 16 pages, 9 figure
A Music Classification Model based on Metric Learning and Feature Extraction from MP3 Audio Files
The development of models for learning music similarity and feature
extraction from audio media files is an increasingly important task for the
entertainment industry. This work proposes a novel music classification model
based on metric learning and feature extraction from MP3 audio files. The
metric learning process considers the learning of a set of parameterized
distances employing a structured prediction approach from a set of MP3 audio
files containing several music genres. The main objective of this work is to
make possible learning a personalized metric for each customer. To extract the
acoustic information we use the Mel-Frequency Cepstral Coefficient (MFCC) and
make a dimensionality reduction with the use of Principal Components Analysis.
We attest the model validity performing a set of experiments and comparing the
training and testing results with baseline algorithms, such as K-means and Soft
Margin Linear Support Vector Machine (SVM). Experiments show promising results
and encourage the future development of an online version of the learning
model.Comment: In a review process, I found some errors and made some changes in
methodology that improved my results. Once I finish the experiments, I will
upload the new versio
- …