Search CORE

2,127 research outputs found

Late Fusion Multi-view Clustering via Global and Local Alignment Maximization

Author: Liu Xinwang
Wang Siwei
Zhu En
Publication venue
Publication date: 01/08/2022
Field of study

Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Although demonstrating promising performance in various applications, most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering, which could cause over-complicated optimization and intensive computational cost. In this paper, we propose late fusion MVC via alignment maximization to address these issues. To do so, we first reveal the theoretical connection of existing k-means clustering and the alignment between base partitions and the consensus one. Based on this observation, we propose a simple but effective multi-view algorithm termed LF-MVC-GAM. It optimally fuses multiple source information in partition level from each individual view, and maximally aligns the consensus partition with these weighted base ones. Such an alignment is beneficial to integrate partition level information and significantly reduce the computational complexity by sufficiently simplifying the optimization procedure. We then design another variant, LF-MVC-LAM to further improve the clustering performance by preserving the local intrinsic structure among multiple partition spaces. After that, we develop two three-step iterative algorithms to solve the resultant optimization problems with theoretically guaranteed convergence. Further, we provide the generalization error bound analysis of the proposed algorithms. Extensive experiments on eighteen multi-view benchmark datasets demonstrate the effectiveness and efficiency of the proposed LF-MVC-GAM and LF-MVC-LAM, ranging from small to large-scale data items. The codes of the proposed algorithms are publicly available at https://github.com/wangsiwei2010/latefusionalignment

arXiv.org e-Print Archive

Representation Learning: A Review and New Perspectives

Author: Bengio Yoshua
Courville Aaron
Vincent Pascal
Publication venue
Publication date: 01/01/2014
Field of study

The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

arXiv.org e-Print Archive

CiteSeerX

PAMOGK: A pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups

Author: Akdemir Mustafa Furkan
Tastan Oznur
Taştan Öznur
Tepeli Yasin Ilkagan
Tepeli Yasin İlkağan
Unal Ali Burak
Ünal Ali Burak
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/06/2020
Field of study

Accurate classification of patients into homogeneous molecular subgroups is critical for the developmentof effective therapeutics and for deciphering what drives these different subtypes to cancer. However, the extensivemolecular heterogeneity observed among cancer patients presents a challenge. The availability of multi-omic datacatalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumorswith unprecedented resolution. In this work, we develop PAMOGK, which integrates multi-omics patient data andincorporates the existing knowledge on biological pathways. PAMOGK is well suited to deal with the sparsity ofalterations in assessing patient similarities. We develop a novel graph kernel which we denote as smoothed shortestpath graph kernel, which evaluates patient similarities based on a single molecular alteration type in the contextof pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alterationcombinations, PAMOGK uses multi-view kernel clustering. We apply PAMOGK to find subgroups of kidney renalclear cell carcinoma (KIRC) patients, which results in four clusters with significantly different survival times (p-value =7.4e-10). The patient subgroups also differ with respect to other clinical parameters such as tumor stage andgrade, and primary tumor and metastasis tumor spreads. When we compare PAMOGK to 8 other state-of-the-artexisting multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partitionpatients into groups with different survival distributions. PAMOGK enables extracting the relative importance ofpathways and molecular data types. PAMOGK is available at github.com/tastanlab/pamog

Sabanci University Research Database