2,127 research outputs found
Late Fusion Multi-view Clustering via Global and Local Alignment Maximization
Multi-view clustering (MVC) optimally integrates complementary information
from different views to improve clustering performance. Although demonstrating
promising performance in various applications, most of existing approaches
directly fuse multiple pre-specified similarities to learn an optimal
similarity matrix for clustering, which could cause over-complicated
optimization and intensive computational cost. In this paper, we propose late
fusion MVC via alignment maximization to address these issues. To do so, we
first reveal the theoretical connection of existing k-means clustering and the
alignment between base partitions and the consensus one. Based on this
observation, we propose a simple but effective multi-view algorithm termed
LF-MVC-GAM. It optimally fuses multiple source information in partition level
from each individual view, and maximally aligns the consensus partition with
these weighted base ones. Such an alignment is beneficial to integrate
partition level information and significantly reduce the computational
complexity by sufficiently simplifying the optimization procedure. We then
design another variant, LF-MVC-LAM to further improve the clustering
performance by preserving the local intrinsic structure among multiple
partition spaces. After that, we develop two three-step iterative algorithms to
solve the resultant optimization problems with theoretically guaranteed
convergence. Further, we provide the generalization error bound analysis of the
proposed algorithms. Extensive experiments on eighteen multi-view benchmark
datasets demonstrate the effectiveness and efficiency of the proposed
LF-MVC-GAM and LF-MVC-LAM, ranging from small to large-scale data items. The
codes of the proposed algorithms are publicly available at
https://github.com/wangsiwei2010/latefusionalignment
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
PAMOGK: A pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups
Accurate classification of patients into homogeneous molecular subgroups is critical for the developmentof effective therapeutics and for deciphering what drives these different subtypes to cancer. However, the extensivemolecular heterogeneity observed among cancer patients presents a challenge. The availability of multi-omic datacatalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumorswith unprecedented resolution. In this work, we develop PAMOGK, which integrates multi-omics patient data andincorporates the existing knowledge on biological pathways. PAMOGK is well suited to deal with the sparsity ofalterations in assessing patient similarities. We develop a novel graph kernel which we denote as smoothed shortestpath graph kernel, which evaluates patient similarities based on a single molecular alteration type in the contextof pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alterationcombinations, PAMOGK uses multi-view kernel clustering. We apply PAMOGK to find subgroups of kidney renalclear cell carcinoma (KIRC) patients, which results in four clusters with significantly different survival times (p-value =7.4e-10). The patient subgroups also differ with respect to other clinical parameters such as tumor stage andgrade, and primary tumor and metastasis tumor spreads. When we compare PAMOGK to 8 other state-of-the-artexisting multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partitionpatients into groups with different survival distributions. PAMOGK enables extracting the relative importance ofpathways and molecular data types. PAMOGK is available at github.com/tastanlab/pamog
- …