1,236 research outputs found
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Evolutionary nonnegative matrix factorization for data compression
This paper aims at improving non-negative matrix factor- ization (NMF) to facilitate data compression. An evolutionary updat- ing strategy is proposed to solve the NMF problem iteratively based on three sets of updating rules including multiplicative, firefly and sur- vival of the fittest rules. For data compression application, the quality of the factorized matrices can be evaluated by measurements such as spar- sity, orthogonality and factorization error to assess compression quality in terms of storage space consumption, redundancy in data matrix and data approximation accuracy. Thus, the fitness score function that drives the evolving procedure is designed as a composite score that takes into account all these measurements. A hybrid initialization scheme is per- formed to improve the rate of convergence, allowing multiple initial can- didates generated by different types of NMF initialization approaches. Effectiveness of the proposed method is demonstrated using Yale and ORL image datasets
Four algorithms to solve symmetric multi-type non-negative matrix tri-factorization problem
In this paper, we consider the symmetric multi-type non-negative matrix
tri-factorization problem (SNMTF), which attempts to factorize several
symmetric non-negative matrices simultaneously. This can be considered as a
generalization of the classical non-negative matrix tri-factorization problem
and includes a non-convex objective function which is a multivariate sixth
degree polynomial and a has convex feasibility set. It has a special importance
in data science, since it serves as a mathematical model for the fusion of
different data sources in data clustering.
We develop four methods to solve the SNMTF. They are based on four
theoretical approaches known from the literature: the fixed point method (FPM),
the block-coordinate descent with projected gradient (BCD), the gradient method
with exact line search (GM-ELS) and the adaptive moment estimation method
(ADAM). For each of these methods we offer a software implementation: for the
former two methods we use Matlab and for the latter Python with the TensorFlow
library.
We test these methods on three data-sets: the synthetic data-set we
generated, while the others represent real-life similarities between different
objects.
Extensive numerical results show that with sufficient computing time all four
methods perform satisfactorily and ADAM most often yields the best mean square
error (). However, if the computation time is limited, FPM gives
the best because it shows the fastest convergence at the
beginning.
All data-sets and codes are publicly available on our GitLab profile
Scalable and interpretable product recommendations via overlapping co-clustering
We consider the problem of generating interpretable recommendations by
identifying overlapping co-clusters of clients and products, based only on
positive or implicit feedback. Our approach is applicable on very large
datasets because it exhibits almost linear complexity in the input examples and
the number of co-clusters. We show, both on real industrial data and on
publicly available datasets, that the recommendation accuracy of our algorithm
is competitive to that of state-of-art matrix factorization techniques. In
addition, our technique has the advantage of offering recommendations that are
textually and visually interpretable. Finally, we examine how to implement our
technique efficiently on Graphical Processing Units (GPUs).Comment: In IEEE International Conference on Data Engineering (ICDE) 201
- …