135,403 research outputs found
Learning unbiased features
A key element in transfer learning is representation learning; if
representations can be developed that expose the relevant factors underlying
the data, then new tasks and domains can be learned readily based on mappings
of these salient factors. We propose that an important aim for these
representations are to be unbiased. Different forms of representation learning
can be derived from alternative definitions of unwanted bias, e.g., bias to
particular tasks, domains, or irrelevant underlying data dimensions. One very
useful approach to estimating the amount of bias in a representation comes from
maximum mean discrepancy (MMD) [5], a measure of distance between probability
distributions. We are not the first to suggest that MMD can be a useful
criterion in developing representations that apply across multiple domains or
tasks [1]. However, in this paper we describe a number of novel applications of
this criterion that we have devised, all based on the idea of developing
unbiased representations. These formulations include: a standard domain
adaptation framework; a method of learning invariant representations; an
approach based on noise-insensitive autoencoders; and a novel form of
generative model.Comment: Published in NIPS 2014 Workshop on Transfer and Multitask Learning,
see http://nips.cc/Conferences/2014/Program/event.php?ID=428
GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization
Bioinformatics tools have been developed to interpret gene expression data at
the gene set level, and these gene set based analyses improve the biologists'
capability to discover functional relevance of their experiment design. While
elucidating gene set individually, inter gene sets association is rarely taken
into consideration. Deep learning, an emerging machine learning technique in
computational biology, can be used to generate an unbiased combination of gene
set, and to determine the biological relevance and analysis consistency of
these combining gene sets by leveraging large genomic data sets. In this study,
we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model
with the incorporation of a priori defined gene sets that retain the crucial
biological features in the latent layer. We introduced the concept of the gene
superset, an unbiased combination of gene sets with weights trained by the
autoencoder, where each node in the latent layer is a superset. Trained with
genomic data from TCGA and evaluated with their accompanying clinical
parameters, we showed gene supersets' ability of discriminating tumor subtypes
and their prognostic capability. We further demonstrated the biological
relevance of the top component gene sets in the significant supersets. Using
autoencoder model and gene superset at its latent layer, we demonstrated that
gene supersets retain sufficient biological information with respect to tumor
subtypes and clinical prognostic significance. Superset also provides high
reproducibility on survival analysis and accurate prediction for cancer
subtypes.Comment: Presented in the International Conference on Intelligent Biology and
Medicine (ICIBM 2018) at Los Angeles, CA, USA and published in BMC Systems
Biology 2018, 12(Suppl 8):14
Unbiased Learning to Rank with Unbiased Propensity Estimation
Learning to rank with biased click data is a well-known challenge. A variety
of methods has been explored to debias click data for learning to rank such as
click models, result interleaving and, more recently, the unbiased
learning-to-rank framework based on inverse propensity weighting. Despite their
differences, most existing studies separate the estimation of click bias
(namely the \textit{propensity model}) from the learning of ranking algorithms.
To estimate click propensities, they either conduct online result
randomization, which can negatively affect the user experience, or offline
parameter estimation, which has special requirements for click data and is
optimized for objectives (e.g. click likelihood) that are not directly related
to the ranking performance of the system. In this work, we address those
problems by unifying the learning of propensity models and ranking models. We
find that the problem of estimating a propensity model from click data is a
dual problem of unbiased learning to rank. Based on this observation, we
propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker
and an \textit{unbiased propensity model}. DLA is an automatic unbiased
learning-to-rank framework as it directly learns unbiased ranking models from
biased click data without any preprocessing. It can adapt to the change of bias
distributions and is applicable to online learning. Our empirical experiments
with synthetic and real-world data show that the models trained with DLA
significantly outperformed the unbiased learning-to-rank algorithms based on
result randomization and the models trained with relevance signals extracted by
click models
- …