1,474 research outputs found
LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles
Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.
Stochastic Data Clustering
In 1961 Herbert Simon and Albert Ando published the theory behind the
long-term behavior of a dynamical system that can be described by a nearly
uncoupled matrix. Over the past fifty years this theory has been used in a
variety of contexts, including queueing theory, brain organization, and
ecology. In all these applications, the structure of the system is known and
the point of interest is the various stages the system passes through on its
way to some long-term equilibrium.
This paper looks at this problem from the other direction. That is, we
develop a technique for using the evolution of the system to tell us about its
initial structure, and we use this technique to develop a new algorithm for
data clustering.Comment: 23 page
Robust Detection of Hierarchical Communities from Escherichia coli Gene Expression Data
Determining the functional structure of biological networks is a central goal
of systems biology. One approach is to analyze gene expression data to infer a
network of gene interactions on the basis of their correlated responses to
environmental and genetic perturbations. The inferred network can then be
analyzed to identify functional communities. However, commonly used algorithms
can yield unreliable results due to experimental noise, algorithmic
stochasticity, and the influence of arbitrarily chosen parameter values.
Furthermore, the results obtained typically provide only a simplistic view of
the network partitioned into disjoint communities and provide no information of
the relationship between communities. Here, we present methods to robustly
detect coregulated and functionally enriched gene communities and demonstrate
their application and validity for Escherichia coli gene expression data.
Applying a recently developed community detection algorithm to the network of
interactions identified with the context likelihood of relatedness (CLR)
method, we show that a hierarchy of network communities can be identified.
These communities significantly enrich for gene ontology (GO) terms, consistent
with them representing biologically meaningful groups. Further, analysis of the
most significantly enriched communities identified several candidate new
regulatory interactions. The robustness of our methods is demonstrated by
showing that a core set of functional communities is reliably found when
artificial noise, modeling experimental noise, is added to the data. We find
that noise mainly acts conservatively, increasing the relatedness required for
a network link to be reliably assigned and decreasing the size of the core
communities, rather than causing association of genes into new communities.Comment: Due to appear in PLoS Computational Biology. Supplementary Figure S1
was not uploaded but is available by contacting the author. 27 pages, 5
figures, 15 supplementary file
Link-Prediction Enhanced Consensus Clustering for Complex Networks
Many real networks that are inferred or collected from data are incomplete
due to missing edges. Missing edges can be inherent to the dataset (Facebook
friend links will never be complete) or the result of sampling (one may only
have access to a portion of the data). The consequence is that downstream
analyses that consume the network will often yield less accurate results than
if the edges were complete. Community detection algorithms, in particular,
often suffer when critical intra-community edges are missing. We propose a
novel consensus clustering algorithm to enhance community detection on
incomplete networks. Our framework utilizes existing community detection
algorithms that process networks imputed by our link prediction based
algorithm. The framework then merges their multiple outputs into a final
consensus output. On average our method boosts performance of existing
algorithms by 7% on artificial data and 17% on ego networks collected from
Facebook
- …