16,435 research outputs found
K2-ABC: Approximate Bayesian Computation with Kernel Embeddings
Complicated generative models often result in a situation where computing the
likelihood of observed data is intractable, while simulating from the
conditional density given a parameter value is relatively easy. Approximate
Bayesian Computation (ABC) is a paradigm that enables simulation-based
posterior inference in such cases by measuring the similarity between simulated
and observed data in terms of a chosen set of summary statistics. However,
there is no general rule to construct sufficient summary statistics for complex
models. Insufficient summary statistics will "leak" information, which leads to
ABC algorithms yielding samples from an incorrect (partial) posterior. In this
paper, we propose a fully nonparametric ABC paradigm which circumvents the need
for manually selecting summary statistics. Our approach, K2-ABC, uses maximum
mean discrepancy (MMD) as a dissimilarity measure between the distributions
over observed and simulated data. MMD is easily estimated as the squared
difference between their empirical kernel embeddings. Experiments on a
simulated scenario and a real-world biological problem illustrate the
effectiveness of the proposed algorithm
Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis
Flow cytometry is often used to characterize the malignant cells in leukemia
and lymphoma patients, traced to the level of the individual cell. Typically,
flow cytometric data analysis is performed through a series of 2-dimensional
projections onto the axes of the data set. Through the years, clinicians have
determined combinations of different fluorescent markers which generate
relatively known expression patterns for specific subtypes of leukemia and
lymphoma -- cancers of the hematopoietic system. By only viewing a series of
2-dimensional projections, the high-dimensional nature of the data is rarely
exploited. In this paper we present a means of determining a low-dimensional
projection which maintains the high-dimensional relationships (i.e.
information) between differing oncological data sets. By using machine learning
techniques, we allow clinicians to visualize data in a low dimension defined by
a linear combination of all of the available markers, rather than just 2 at a
time. This provides an aid in diagnosing similar forms of cancer, as well as a
means for variable selection in exploratory flow cytometric research. We refer
to our method as Information Preserving Component Analysis (IPCA).Comment: 26 page
Two-Stage Metric Learning
In this paper, we present a novel two-stage metric learning algorithm. We
first map each learning instance to a probability distribution by computing its
similarities to a set of fixed anchor points. Then, we define the distance in
the input data space as the Fisher information distance on the associated
statistical manifold. This induces in the input data space a new family of
distance metric with unique properties. Unlike kernelized metric learning, we
do not require the similarity measure to be positive semi-definite. Moreover,
it can also be interpreted as a local metric learning algorithm with well
defined distance approximation. We evaluate its performance on a number of
datasets. It outperforms significantly other metric learning methods and SVM.Comment: Accepted for publication in ICML 201
- …