Search CORE

1,236 research outputs found

Microbial community pattern detection in human body habitats via ensemble clustering framework

Author: Chua Hon-Nian
Li Xiao-Li
Ning Kang
Ou-Yang Le
Su Xiaoquan
Yang Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The human habitat is a host where microbial species evolve, function, and continue to evolve. Elucidating how microbial communities respond to human habitats is a fundamental and critical task, as establishing baselines of human microbiome is essential in understanding its role in human disease and health. However, current studies usually overlook a complex and interconnected landscape of human microbiome and limit the ability in particular body habitats with learning models of specific criterion. Therefore, these methods could not capture the real-world underlying microbial patterns effectively. To obtain a comprehensive view, we propose a novel ensemble clustering framework to mine the structure of microbial community pattern on large-scale metagenomic data. Particularly, we first build a microbial similarity network via integrating 1920 metagenomic samples from three body habitats of healthy adults. Then a novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is proposed and applied onto the network to detect clustering pattern. Extensive experiments are conducted to evaluate the effectiveness of our model on deriving microbial community with respect to body habitat and host gender. From clustering results, we observed that body habitat exhibits a strong bound but non-unique microbial structural patterns. Meanwhile, human microbiome reveals different degree of structural variations over body habitat and host gender. In summary, our ensemble clustering framework could efficiently explore integrated clustering results to accurately identify microbial communities, and provide a comprehensive view for a set of microbial communities. Such trends depict an integrated biography of microbial communities, which offer a new insight towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201

arXiv.org e-Print Archive

Crossref

PubMed Central

Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences

ScholarBank@NUS

Fuse: Multiple Network Alignment via Data Fusion

Author: Gligorijević V
Malod-Dognin N
Pržulj N
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/10/2015
Field of study

Spiral - Imperial College Digital Repository

Evolutionary nonnegative matrix factorization for data compression

Author: Gong Liyun
Mu Tingting
Y. Goulermas John
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This paper aims at improving non-negative matrix factor- ization (NMF) to facilitate data compression. An evolutionary updat- ing strategy is proposed to solve the NMF problem iteratively based on three sets of updating rules including multiplicative, firefly and sur- vival of the fittest rules. For data compression application, the quality of the factorized matrices can be evaluated by measurements such as spar- sity, orthogonality and factorization error to assess compression quality in terms of storage space consumption, redundancy in data matrix and data approximation accuracy. Thus, the fitness score function that drives the evolving procedure is designed as a composite score that takes into account all these measurements. A hybrid initialization scheme is per- formed to improve the rate of convergence, allowing multiple initial can- didates generated by different types of NMF initialization approaches. Effectiveness of the proposed method is demonstrated using Yale and ORL image datasets

University of Lincoln Institutional Repository

Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem

Author: Hrga Timotej
Hribar Rok
Papa Gregor
Petelin Gašper
Povh Janez
Pržulj Nataša
Vukašinović Vida
Publication venue
Publication date: 10/12/2020
Field of study

In this paper, we consider the symmetric multi-type non-negative matrix tri-factorization problem (SNMTF), which attempts to factorize several symmetric non-negative matrices simultaneously. This can be considered as a generalization of the classical non-negative matrix tri-factorization problem and includes a non-convex objective function which is a multivariate sixth degree polynomial and a has convex feasibility set. It has a special importance in data science, since it serves as a mathematical model for the fusion of different data sources in data clustering. We develop four methods to solve the SNMTF. They are based on four theoretical approaches known from the literature: the fixed point method (FPM), the block-coordinate descent with projected gradient (BCD), the gradient method with exact line search (GM-ELS) and the adaptive moment estimation method (ADAM). For each of these methods we offer a software implementation: for the former two methods we use Matlab and for the latter Python with the TensorFlow library. We test these methods on three data-sets: the synthetic data-set we generated, while the others represent real-life similarities between different objects. Extensive numerical results show that with sufficient computing time all four methods perform satisfactorily and ADAM most often yields the best mean square error (

\mathrm{MSE}

). However, if the computation time is limited, FPM gives the best

\mathrm{MSE}

because it shows the fastest convergence at the beginning. All data-sets and codes are publicly available on our GitLab profile

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Repository of the University of Ljubljana

Scalable and interpretable product recommendations via overlapping co-clustering

Author: Dünner Celestine
Heckel Reinhard
Parnell Thomas
Vlachos Michail
Publication venue
Publication date: 01/04/2017
Field of study

We consider the problem of generating interpretable recommendations by identifying overlapping co-clusters of clients and products, based only on positive or implicit feedback. Our approach is applicable on very large datasets because it exhibits almost linear complexity in the input examples and the number of co-clusters. We show, both on real industrial data and on publicly available datasets, that the recommendation accuracy of our algorithm is competitive to that of state-of-art matrix factorization techniques. In addition, our technique has the advantage of offering recommendations that are textually and visually interpretable. Finally, we examine how to implement our technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201

arXiv.org e-Print Archive

Serveur académique lausannois