2,127 research outputs found

    Late Fusion Multi-view Clustering via Global and Local Alignment Maximization

    Full text link
    Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Although demonstrating promising performance in various applications, most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering, which could cause over-complicated optimization and intensive computational cost. In this paper, we propose late fusion MVC via alignment maximization to address these issues. To do so, we first reveal the theoretical connection of existing k-means clustering and the alignment between base partitions and the consensus one. Based on this observation, we propose a simple but effective multi-view algorithm termed LF-MVC-GAM. It optimally fuses multiple source information in partition level from each individual view, and maximally aligns the consensus partition with these weighted base ones. Such an alignment is beneficial to integrate partition level information and significantly reduce the computational complexity by sufficiently simplifying the optimization procedure. We then design another variant, LF-MVC-LAM to further improve the clustering performance by preserving the local intrinsic structure among multiple partition spaces. After that, we develop two three-step iterative algorithms to solve the resultant optimization problems with theoretically guaranteed convergence. Further, we provide the generalization error bound analysis of the proposed algorithms. Extensive experiments on eighteen multi-view benchmark datasets demonstrate the effectiveness and efficiency of the proposed LF-MVC-GAM and LF-MVC-LAM, ranging from small to large-scale data items. The codes of the proposed algorithms are publicly available at https://github.com/wangsiwei2010/latefusionalignment

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    PAMOGK: A pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups

    Get PDF
    Accurate classification of patients into homogeneous molecular subgroups is critical for the developmentof effective therapeutics and for deciphering what drives these different subtypes to cancer. However, the extensivemolecular heterogeneity observed among cancer patients presents a challenge. The availability of multi-omic datacatalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumorswith unprecedented resolution. In this work, we develop PAMOGK, which integrates multi-omics patient data andincorporates the existing knowledge on biological pathways. PAMOGK is well suited to deal with the sparsity ofalterations in assessing patient similarities. We develop a novel graph kernel which we denote as smoothed shortestpath graph kernel, which evaluates patient similarities based on a single molecular alteration type in the contextof pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alterationcombinations, PAMOGK uses multi-view kernel clustering. We apply PAMOGK to find subgroups of kidney renalclear cell carcinoma (KIRC) patients, which results in four clusters with significantly different survival times (p-value =7.4e-10). The patient subgroups also differ with respect to other clinical parameters such as tumor stage andgrade, and primary tumor and metastasis tumor spreads. When we compare PAMOGK to 8 other state-of-the-artexisting multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partitionpatients into groups with different survival distributions. PAMOGK enables extracting the relative importance ofpathways and molecular data types. PAMOGK is available at github.com/tastanlab/pamog
    corecore