4,034 research outputs found
A Kernel Classification Framework for Metric Learning
Learning a distance metric from the given training samples plays a crucial
role in many machine learning tasks, and various models and optimization
algorithms have been proposed in the past decade. In this paper, we generalize
several state-of-the-art metric learning methods, such as large margin nearest
neighbor (LMNN) and information theoretic metric learning (ITML), into a kernel
classification framework. First, doublets and triplets are constructed from the
training samples, and a family of degree-2 polynomial kernel functions are
proposed for pairs of doublets or triplets. Then, a kernel classification
framework is established, which can not only generalize many popular metric
learning methods such as LMNN and ITML, but also suggest new metric learning
methods, which can be efficiently implemented, interestingly, by using the
standard support vector machine (SVM) solvers. Two novel metric learning
methods, namely doublet-SVM and triplet-SVM, are then developed under the
proposed framework. Experimental results show that doublet-SVM and triplet-SVM
achieve competitive classification accuracies with state-of-the-art metric
learning methods such as ITML and LMNN but with significantly less training
time.Comment: 11 pages, 7 figure
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
Diversified Hidden Markov Models for Sequential Labeling
Labeling of sequential data is a prevalent meta-problem for a wide range of
real world applications. While the first-order Hidden Markov Models (HMM)
provides a fundamental approach for unsupervised sequential labeling, the basic
model does not show satisfying performance when it is directly applied to real
world problems, such as part-of-speech tagging (PoS tagging) and optical
character recognition (OCR). Aiming at improving performance, important
extensions of HMM have been proposed in the literatures. One of the common key
features in these extensions is the incorporation of proper prior information.
In this paper, we propose a new extension of HMM, termed diversified Hidden
Markov Models (dHMM), which utilizes a diversity-encouraging prior over the
state-transition probabilities and thus facilitates more dynamic sequential
labellings. Specifically, the diversity is modeled by a continuous
determinantal point process prior, which we apply to both unsupervised and
supervised scenarios. Learning and inference algorithms for dHMM are derived.
Empirical evaluations on benchmark datasets for unsupervised PoS tagging and
supervised OCR confirmed the effectiveness of dHMM, with competitive
performance to the state-of-the-art.Comment: 14 pages, 12 figure
Patterns for Learning with Side Information
Supervised, semi-supervised, and unsupervised learning estimate a function
given input/output samples. Generalization of the learned function to unseen
data can be improved by incorporating side information into learning. Side
information are data that are neither from the input space nor from the output
space of the function, but include useful information for learning it. In this
paper we show that learning with side information subsumes a variety of related
approaches, e.g. multi-task learning, multi-view learning and learning using
privileged information. Our main contributions are (i) a new perspective that
connects these previously isolated approaches, (ii) insights about how these
methods incorporate different types of prior knowledge, and hence implement
different patterns, (iii) facilitating the application of these methods in
novel tasks, as well as (iv) a systematic experimental evaluation of these
patterns in two supervised learning tasks.Comment: The first two authors contributed equally to this wor
Learning to Hash for Indexing Big Data - A Survey
The explosive growth in big data has attracted much attention in designing
efficient indexing and search methods recently. In many critical applications
such as large-scale search and pattern matching, finding the nearest neighbors
to a query is a fundamental research problem. However, the straightforward
solution using exhaustive comparison is infeasible due to the prohibitive
computational complexity and memory requirement. In response, Approximate
Nearest Neighbor (ANN) search based on hashing techniques has become popular
due to its promising performance in both efficiency and accuracy. Prior
randomized hashing methods, e.g., Locality-Sensitive Hashing (LSH), explore
data-independent hash functions with random projections or permutations.
Although having elegant theoretic guarantees on the search quality in certain
metric spaces, performance of randomized hashing has been shown insufficient in
many real-world applications. As a remedy, new approaches incorporating
data-driven learning methods in development of advanced hash functions have
emerged. Such learning to hash methods exploit information such as data
distributions or class labels when optimizing the hash codes or functions.
Importantly, the learned hash codes are able to preserve the proximity of
neighboring data in the original feature spaces in the hash code spaces. The
goal of this paper is to provide readers with systematic understanding of
insights, pros and cons of the emerging techniques. We provide a comprehensive
survey of the learning to hash framework and representative techniques of
various types, including unsupervised, semi-supervised, and supervised. In
addition, we also summarize recent hashing approaches utilizing the deep
learning models. Finally, we discuss the future direction and trends of
research in this area
A Structured Prediction Approach for Missing Value Imputation
Missing value imputation is an important practical problem. There is a large
body of work on it, but there does not exist any work that formulates the
problem in a structured output setting. Also, most applications have
constraints on the imputed data, for example on the distribution associated
with each variable. None of the existing imputation methods use these
constraints. In this paper we propose a structured output approach for missing
value imputation that also incorporates domain constraints. We focus on large
margin models, but it is easy to extend the ideas to probabilistic models. We
deal with the intractable inference step in learning via a piecewise training
technique that is simple, efficient, and effective. Comparison with existing
state-of-the-art and baseline imputation methods shows that our method gives
significantly improved performance on the Hamming loss measure.Comment: 9 Page
Monocular Depth Estimation using Multi-Scale Continuous CRFs as Sequential Deep Networks
Depth cues have been proved very useful in various computer vision and
robotic tasks. This paper addresses the problem of monocular depth estimation
from a single still image. Inspired by the effectiveness of recent works on
multi-scale convolutional neural networks (CNN), we propose a deep model which
fuses complementary information derived from multiple CNN side outputs.
Different from previous methods using concatenation or weighted average
schemes, the integration is obtained by means of continuous Conditional Random
Fields (CRFs). In particular, we propose two different variations, one based on
a cascade of multiple CRFs, the other on a unified graphical model. By
designing a novel CNN implementation of mean-field updates for continuous CRFs,
we show that both proposed models can be regarded as sequential deep networks
and that training can be performed end-to-end. Through an extensive
experimental evaluation, we demonstrate the effectiveness of the proposed
approach and establish new state of the art results for the monocular depth
estimation task on three publicly available datasets, i.e. NYUD-V2, Make3D and
KITTI.Comment: arXiv admin note: substantial text overlap with arXiv:1704.0215
Machine learning based hyperspectral image analysis: A survey
Hyperspectral sensors enable the study of the chemical properties of scene
materials remotely for the purpose of identification, detection, and chemical
composition analysis of objects in the environment. Hence, hyperspectral images
captured from earth observing satellites and aircraft have been increasingly
important in agriculture, environmental monitoring, urban planning, mining, and
defense. Machine learning algorithms due to their outstanding predictive power
have become a key tool for modern hyperspectral image analysis. Therefore, a
solid understanding of machine learning techniques have become essential for
remote sensing researchers and practitioners. This paper reviews and compares
recent machine learning-based hyperspectral image analysis methods published in
literature. We organize the methods by the image analysis task and by the type
of machine learning algorithm, and present a two-way mapping between the image
analysis tasks and the types of machine learning algorithms that can be applied
to them. The paper is comprehensive in coverage of both hyperspectral image
analysis tasks and machine learning algorithms. The image analysis tasks
considered are land cover classification, target detection, unmixing, and
physical parameter estimation. The machine learning algorithms covered are
Gaussian models, linear regression, logistic regression, support vector
machines, Gaussian mixture model, latent linear models, sparse linear models,
Gaussian mixture models, ensemble learning, directed graphical models,
undirected graphical models, clustering, Gaussian processes, Dirichlet
processes, and deep learning. We also discuss the open challenges in the field
of hyperspectral image analysis and explore possible future directions
A Survey on Learning to Hash
Nearest neighbor search is a problem of finding the data points from the
database such that the distances from them to the query point are the smallest.
Learning to hash is one of the major solutions to this problem and has been
widely studied recently. In this paper, we present a comprehensive survey of
the learning to hash algorithms, categorize them according to the manners of
preserving the similarities into: pairwise similarity preserving, multiwise
similarity preserving, implicit similarity preserving, as well as quantization,
and discuss their relations. We separate quantization from pairwise similarity
preserving as the objective function is very different though quantization, as
we show, can be derived from preserving the pairwise similarities. In addition,
we present the evaluation protocols, and the general performance analysis, and
point out that the quantization algorithms perform superiorly in terms of
search accuracy, search time cost, and space cost. Finally, we introduce a few
emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine
Intelligence (TPAMI
CrossCat: A Fully Bayesian Nonparametric Method for Analyzing Heterogeneous, High Dimensional Data
There is a widespread need for statistical methods that can analyze
high-dimensional datasets with- out imposing restrictive or opaque modeling
assumptions. This paper describes a domain-general data analysis method called
CrossCat. CrossCat infers multiple non-overlapping views of the data, each
consisting of a subset of the variables, and uses a separate nonparametric
mixture to model each view. CrossCat is based on approximately Bayesian
inference in a hierarchical, nonparamet- ric model for data tables. This model
consists of a Dirichlet process mixture over the columns of a data table in
which each mixture component is itself an independent Dirichlet process mixture
over the rows; the inner mixture components are simple parametric models whose
form depends on the types of data in the table. CrossCat combines strengths of
mixture modeling and Bayesian net- work structure learning. Like mixture
modeling, CrossCat can model a broad class of distributions by positing latent
variables, and produces representations that can be efficiently conditioned and
sampled from for prediction. Like Bayesian networks, CrossCat represents the
dependencies and independencies between variables, and thus remains accurate
when there are multiple statistical signals. Inference is done via a scalable
Gibbs sampling scheme; this paper shows that it works well in practice. This
paper also includes empirical results on heterogeneous tabular data of up to 10
million cells, such as hospital cost and quality measures, voting records,
unemployment rates, gene expression measurements, and images of handwritten
digits. CrossCat infers structure that is consistent with accepted findings and
common-sense knowledge in multiple domains and yields predictive accuracy
competitive with generative, discriminative, and model-free alternatives
- …