867 research outputs found
Active Learning with Statistical Models
For many types of machine learning algorithms, one can compute the
statistically `optimal' way to select training data. In this paper, we review
how optimal data selection techniques have been used with feedforward neural
networks. We then show how the same principles may be used to select data for
two alternative, statistically-based learning architectures: mixtures of
Gaussians and locally weighted regression. While the techniques for neural
networks are computationally expensive and approximate, the techniques for
mixtures of Gaussians and locally weighted regression are both efficient and
accurate. Empirically, we observe that the optimality criterion sharply
decreases the number of training examples the learner needs in order to achieve
good performance.Comment: See http://www.jair.org/ for any accompanying file
Ranking relations using analogies in biological and information networks
Analogical reasoning depends fundamentally on the ability to learn and
generalize about relations between objects. We develop an approach to
relational learning which, given a set of pairs of objects
,
measures how well other pairs A:B fit in with the set . Our work
addresses the following question: is the relation between objects A and B
analogous to those relations found in ? Such questions are
particularly relevant in information retrieval, where an investigator might
want to search for analogous pairs of objects that match the query set of
interest. There are many ways in which objects can be related, making the task
of measuring analogies very challenging. Our approach combines a similarity
measure on function spaces with Bayesian analysis to produce a ranking. It
requires data containing features of the objects of interest and a link matrix
specifying which relationships exist; no further attributes of such
relationships are necessary. We illustrate the potential of our method on text
analysis and information networks. An application on discovering functional
interactions between pairs of proteins is discussed in detail, where we show
that our approach can work in practice even if a small set of protein pairs is
provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS321 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
The supervised IBP: neighbourhood preserving infinite latent feature models
We propose a probabilistic model to infer supervised latent variables in the Hamming space from observed data. Our model allows simultaneous inference of the number of binary latent variables, and their values. The latent variables preserve neighbourhood structure of the data in a sense that objects in the same semantic concept have similar latent values, and objects in different concepts have dissimilar latent values. We formulate the supervised infinite latent variable problem based on an intuitive principle of pulling objects together if they are of the same type, and pushing them apart if they are not. We then combine this principle with a flexible Indian Buffet Process prior on the latent variables. We show that the inferred supervised latent variables can be directly used to perform a nearest neighbour search for the purpose of retrieval. We introduce a new application of dynamically extending hash codes, and show how to effectively couple the structure of the hash codes with continuously growing structure of the neighbourhood preserving infinite latent feature space
Linear dimensionality reduction: Survey, insights, and generalizations
Linear dimensionality reduction methods are a cornerstone of analyzing high
dimensional data, due to their simple geometric interpretations and typically
attractive computational properties. These methods capture many data features
of interest, such as covariance, dynamical structure, correlation between data
sets, input-output relationships, and margin between data classes. Methods have
been developed with a variety of names and motivations in many fields, and
perhaps as a result the connections between all these methods have not been
highlighted. Here we survey methods from this disparate literature as
optimization programs over matrix manifolds. We discuss principal component
analysis, factor analysis, linear multidimensional scaling, Fisher's linear
discriminant analysis, canonical correlations analysis, maximum autocorrelation
factors, slow feature analysis, sufficient dimensionality reduction,
undercomplete independent component analysis, linear regression, distance
metric learning, and more. This optimization framework gives insight to some
rarely discussed shortcomings of well-known methods, such as the suboptimality
of certain eigenvector solutions. Modern techniques for optimization over
matrix manifolds enable a generic linear dimensionality reduction solver, which
accepts as input data and an objective to be optimized, and returns, as output,
an optimal low-dimensional projection of the data. This simple optimization
framework further allows straightforward generalizations and novel variants of
classical methods, which we demonstrate here by creating an
orthogonal-projection canonical correlations analysis. More broadly, this
survey and generic solver suggest that linear dimensionality reduction can move
toward becoming a blackbox, objective-agnostic numerical technology.JPC and ZG received funding from the UK Engineering and Physical Sciences Research Council (EPSRC EP/H019472/1). JPC received funding from a Sloan Research Fellowship, the Simons Foundation (SCGB#325171 and SCGB#325233), the Grossman Center at Columbia University, and the Gatsby Charitable Trust.This is the author accepted manuscript. The final version is available from MIT Press via http://jmlr.org/papers/v16/cunningham15a.htm
Recommended from our members
Automatic discovery of the statistical types of variables in a dataset
A common practice in statistics and machine learning is to assume that the statistical data types (e.g., ordinal, categorical or real-valued) of variables, and usually also the likelihood model, is known. However, as the availability of real- world data increases, this assumption becomes too restrictive. Data are often heterogeneous, complex, and improperly or incompletely documented. Surprisingly, despite their practical importance, there is still a lack of tools to automatically discover the statistical types of, as well as appropriate likelihood (noise) models for, the variables in a dataset. In this paper, we fill this gap by proposing a Bayesian method, which accurately discovers the statistical data types in both synthetic and real data.Humboldt Research Fellowship for Postdoctoral Researchers, which funded this research during her stay at the Max Planck Institute for Software Systems.
ATI Grant EP/N510129/1
EPSRC Grant EP/N014162/1
Googl
Kronecker Graphs: An Approach to Modeling Networks
How can we model networks with a mathematically tractable model that allows
for rigorous analysis of network properties? Networks exhibit a long list of
surprising properties: heavy tails for the degree distribution; small
diameters; and densification and shrinking diameters over time. Most present
network models either fail to match several of the above properties, are
complicated to analyze mathematically, or both. In this paper we propose a
generative model for networks that is both mathematically tractable and can
generate networks that have the above mentioned properties. Our main idea is to
use the Kronecker product to generate graphs that we refer to as "Kronecker
graphs".
First, we prove that Kronecker graphs naturally obey common network
properties. We also provide empirical evidence showing that Kronecker graphs
can effectively model the structure of real networks.
We then present KronFit, a fast and scalable algorithm for fitting the
Kronecker graph generation model to large real networks. A naive approach to
fitting would take super- exponential time. In contrast, KronFit takes linear
time, by exploiting the structure of Kronecker matrix multiplication and by
using statistical simulation techniques.
Experiments on large real and synthetic networks show that KronFit finds
accurate parameters that indeed very well mimic the properties of target
networks. Once fitted, the model parameters can be used to gain insights about
the network structure, and the resulting synthetic graphs can be used for null-
models, anonymization, extrapolations, and graph summarization
Latent Gaussian processes for distribution estimation of multivariate categorical data
Multivariate categorical data occur in many applications of machine learning.
One of the main difficulties with these vectors of categorical variables is
sparsity. The number of possible observations grows exponentially with vector
length, but dataset diversity might be poor in comparison. Recent models have
gained significant improvement in supervised tasks with this data. These models
embed observations in a continuous space to capture similarities between them.
Building on these ideas we propose a Bayesian model for the unsupervised task
of distribution estimation of multivariate categorical data. We model vectors
of categorical variables as generated from a non-linear transformation of a
continuous latent space. Non-linearity captures multi-modality in the
distribution. The continuous representation addresses sparsity. Our model ties
together many existing models, linking the linear categorical latent Gaussian
model, the Gaussian process latent variable model, and Gaussian process
classification. We derive inference for our model based on recent developments
in sampling based variational inference. We show empirically that the model
outperforms its linear and discrete counterparts in imputation tasks of sparse
data.YG is supported by the Google European fellowship in Machine Learning.This is the final version of the article. It first appeared from Microtome Publishing via http://jmlr.org/proceedings/papers/v37/gala15.htm
Scalable variational Gaussian process classification
Gaussian process classification is a popular method with a number of
appealing properties. We show how to scale the model within a variational
inducing point framework, outperforming the state of the art on benchmark
datasets. Importantly, the variational formulation can be exploited to allow
classification in problems with millions of data points, as we demonstrate in
experiments.JH was supported by a MRC fellowship, AM and ZG by EPSRC grant EP/I036575/1, and a Google Focussed Research award.This is the final version of the article. It was first available from JMLR via http://jmlr.org/proceedings/papers/v38/hensman15.pd
Neural adaptive sequential Monte Carlo
Sequential Monte Carlo (SMC), or particle filtering, is a popular class of
methods for sampling from an intractable target distribution using a sequence
of simpler intermediate distributions. Like other importance sampling-based
methods, performance is critically dependent on the proposal distribution: a
bad proposal can lead to arbitrarily inaccurate estimates of the target
distribution. This paper presents a new method for automatically adapting the
proposal using an approximation of the Kullback-Leibler divergence between the
true posterior and the proposal distribution. The method is very flexible,
applicable to any parameterized proposal distribution and it supports online
and batch variants. We use the new framework to adapt powerful proposal
distributions with rich parameterizations based upon neural networks leading to
Neural Adaptive Sequential Monte Carlo (NASMC). Experiments indicate that NASMC
significantly improves inference in a non-linear state space model
outperforming adaptive proposal methods including the Extended Kalman and
Unscented Particle Filters. Experiments also indicate that improved inference
translates into improved parameter learning when NASMC is used as a subroutine
of Particle Marginal Metropolis Hastings. Finally we show that NASMC is able to
train a latent variable recurrent neural network (LV-RNN) achieving results
that compete with the state-of-the-art for polymorphic music modelling. NASMC
can be seen as bridging the gap between adaptive SMC methods and the recent
work in scalable, black-box variational inference
A birth-death process for feature allocation
We propose a Bayesian nonparametric prior over feature allocations for sequential data, the birth-death feature allocation process (BDFP). The BDFP models the evolution of the feature allocation of a set of N objects across a covariate (e.g. time) by creating and deleting features. A BDFP is exchangeable, projective, stationary and reversible, and its equilibrium distribution is given by the Indian buffet process (IBP). We show that the Beta process on an extended space is the de Finetti mixing distribution underlying the BDFP. Finally, we present the finite approximation of the BDFP, the Beta Event Process (BEP), that permits simplified inference. The utility of the BDFP as a prior is demonstrated on real world dynamic genomics and social network dat
- …