11,972 research outputs found
Recommended from our members
Statistical methods for multi-omic data integration
The thesis is focused on the development of new ways to integrate multiple ’omic datasets in the context of precision medicine. This type of analyses have the potential to help researchers deepen their understanding of biological mechanisms underlying disease. However, integrative studies pose several challenges, due to the typically widely differing characteristics of the ’omic layers in terms of number of predictors, type of data, and level of noise.
In this work, we first tackle the problem of performing variable selection and building supervised models, while integrating multiple ’omic datasets of different type. It has been recently shown that applying classical logistic regression with elastic-net penalty to these datasets can lead to poor results. Therefore, we suggest a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately and a predictive model is subsequently built on the ensemble of the selected variables.
In the unsupervised setting, we first examine cluster of clusters analysis (COCA), an integrative clustering approach that combines information from multiple data sources. COCA has been widely applied in the context of tumour subtyping, but its properties have never been systematically explored before, and its robustness to the inclusion of noisy datasets is unclear. Then, we propose a new statistical method for the unsupervised integration of multi-omic data, called kernel learning integrative clustering (KLIC). This approach is based on the idea to frame the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering.
Finally, we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian mixture models. A key contribution of our work is the observation that PSMs can be used to define probabilistically-motivated kernel matrices that capture the clustering structure present in the data. This observation enables us to employ a range of kernel methods to obtain summary clusterings, and, if we have multiple PSMs, use standard methods for combining kernels in order to perform integrative clustering. We also show that one can embed PSMs within predictive kernel models in order to perform outcome-guided clustering
Unified View Imputation and Feature Selection Learning for Incomplete Multi-view Data
Although multi-view unsupervised feature selection (MUFS) is an effective
technology for reducing dimensionality in machine learning, existing methods
cannot directly deal with incomplete multi-view data where some samples are
missing in certain views. These methods should first apply predetermined values
to impute missing data, then perform feature selection on the complete dataset.
Separating imputation and feature selection processes fails to capitalize on
the potential synergy where local structural information gleaned from feature
selection could guide the imputation, thereby improving the feature selection
performance in turn. Additionally, previous methods only focus on leveraging
samples' local structure information, while ignoring the intrinsic locality of
the feature space. To tackle these problems, a novel MUFS method, called
UNified view Imputation and Feature selectIon lEaRning (UNIFIER), is proposed.
UNIFIER explores the local structure of multi-view data by adaptively learning
similarity-induced graphs from both the sample and feature spaces. Then,
UNIFIER dynamically recovers the missing views, guided by the sample and
feature similarity graphs during the feature selection procedure. Furthermore,
the half-quadratic minimization technique is used to automatically weight
different instances, alleviating the impact of outliers and unreliable restored
data. Comprehensive experimental results demonstrate that UNIFIER outperforms
other state-of-the-art methods
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
- …