30,050 research outputs found
Applying Cluster Ensemble to Adaptive Tree Structured Clustering
Adaptive tree structured clustering (ATSC) is our proposed divisive hierarchical clustering method that recursively
divides a data set into 2 subsets using self-organizing feature map (SOM). In each partition, the data set is quantized by SOM and the quantized data is divided using agglomerative hierarchical clustering. ATSC can divide data sets regardless of data size in feasible time. On the other hand clustering result stability of ATSC is equally unstable as other divisive hierarchical clustering and partitioned clustering methods. In this paper, we apply cluster ensemble for each data partition of ATSC in order to improve stability. Cluster ensemble is a framework for improving partitioned clustering stability. As a result of applying cluster ensemble, ATSC yields unique clustering results that could not be yielded by previous hierarchical clustering methods. This is because a different class distances function is used in each division in ATSC
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
Steganographer Identification
Conventional steganalysis detects the presence of steganography within single
objects. In the real-world, we may face a complex scenario that one or some of
multiple users called actors are guilty of using steganography, which is
typically defined as the Steganographer Identification Problem (SIP). One might
use the conventional steganalysis algorithms to separate stego objects from
cover objects and then identify the guilty actors. However, the guilty actors
may be lost due to a number of false alarms. To deal with the SIP, most of the
state-of-the-arts use unsupervised learning based approaches. In their
solutions, each actor holds multiple digital objects, from which a set of
feature vectors can be extracted. The well-defined distances between these
feature sets are determined to measure the similarity between the corresponding
actors. By applying clustering or outlier detection, the most suspicious
actor(s) will be judged as the steganographer(s). Though the SIP needs further
study, the existing works have good ability to identify the steganographer(s)
when non-adaptive steganographic embedding was applied. In this chapter, we
will present foundational concepts and review advanced methodologies in SIP.
This chapter is self-contained and intended as a tutorial introducing the SIP
in the context of media steganography.Comment: A tutorial with 30 page
Hyperbolic Geometry of Complex Networks
We develop a geometric framework to study the structure and function of
complex networks. We assume that hyperbolic geometry underlies these networks,
and we show that with this assumption, heterogeneous degree distributions and
strong clustering in complex networks emerge naturally as simple reflections of
the negative curvature and metric property of the underlying hyperbolic
geometry. Conversely, we show that if a network has some metric structure, and
if the network degree distribution is heterogeneous, then the network has an
effective hyperbolic geometry underneath. We then establish a mapping between
our geometric framework and statistical mechanics of complex networks. This
mapping interprets edges in a network as non-interacting fermions whose
energies are hyperbolic distances between nodes, while the auxiliary fields
coupled to edges are linear functions of these energies or distances. The
geometric network ensemble subsumes the standard configuration model and
classical random graphs as two limiting cases with degenerate geometric
structures. Finally, we show that targeted transport processes without global
topology knowledge, made possible by our geometric framework, are maximally
efficient, according to all efficiency measures, in networks with strongest
heterogeneity and clustering, and that this efficiency is remarkably robust
with respect to even catastrophic disturbances and damages to the network
structure
Ensemble clustering for result diversification
This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run
- …