9,760 research outputs found
Recommended from our members
ManiNetCluster: a novel manifold learning approach to reveal the functional links between gene networks.
BACKGROUND:The coordination of genomic functions is a critical and complex process across biological systems such as phenotypes or states (e.g., time, disease, organism, environmental perturbation). Understanding how the complexity of genomic function relates to these states remains a challenge. To address this, we have developed a novel computational method, ManiNetCluster, which simultaneously aligns and clusters gene networks (e.g., co-expression) to systematically reveal the links of genomic function between different conditions. Specifically, ManiNetCluster employs manifold learning to uncover and match local and non-linear structures among networks, and identifies cross-network functional links. RESULTS:We demonstrated that ManiNetCluster better aligns the orthologous genes from their developmental expression profiles across model organisms than state-of-the-art methods (p-value <2.2×10-16). This indicates the potential non-linear interactions of evolutionarily conserved genes across species in development. Furthermore, we applied ManiNetCluster to time series transcriptome data measured in the green alga Chlamydomonas reinhardtii to discover the genomic functions linking various metabolic processes between the light and dark periods of a diurnally cycling culture. We identified a number of genes putatively regulating processes across each lighting regime. CONCLUSIONS:ManiNetCluster provides a novel computational tool to uncover the genes linking various functions from different networks, providing new insight on how gene functions coordinate across different conditions. ManiNetCluster is publicly available as an R package at https://github.com/daifengwanglab/ManiNetCluster
Clustering functional data using wavelets
We present two methods for detecting patterns and clusters in high
dimensional time-dependent functional data. Our methods are based on
wavelet-based similarity measures, since wavelets are well suited for
identifying highly discriminant local time and scale features. The
multiresolution aspect of the wavelet transform provides a time-scale
decomposition of the signals allowing to visualize and to cluster the
functional data into homogeneous groups. For each input function, through its
empirical orthogonal wavelet transform the first method uses the distribution
of energy across scales generate a handy number of features that can be
sufficient to still make the signals well distinguishable. Our new similarity
measure combined with an efficient feature selection technique in the wavelet
domain is then used within more or less classical clustering algorithms to
effectively differentiate among high dimensional populations. The second method
uses dissimilarity measures between the whole time-scale representations and
are based on wavelet-coherence tools. The clustering is then performed using a
k-centroid algorithm starting from these dissimilarities. Practical performance
of these methods that jointly designs both the feature selection in the wavelet
domain and the classification distance is demonstrated through simulations as
well as daily profiles of the French electricity power demand
Deep Time-Series Clustering: A Review
We present a comprehensive, detailed review of time-series data analysis, with emphasis on deep time-series clustering (DTSC), and a case study in the context of movement behavior clustering utilizing the deep clustering method. Specifically, we modified the DCAE architectures to suit time-series data at the time of our prior deep clustering work. Lately, several works have been carried out on deep clustering of time-series data. We also review these works and identify state-of-the-art, as well as present an outlook on this important field of DTSC from five important perspectives
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Variable-free exploration of stochastic models: a gene regulatory network example
Finding coarse-grained, low-dimensional descriptions is an important task in
the analysis of complex, stochastic models of gene regulatory networks. This
task involves (a) identifying observables that best describe the state of these
complex systems and (b) characterizing the dynamics of the observables. In a
previous paper [13], we assumed that good observables were known a priori, and
presented an equation-free approach to approximate coarse-grained quantities
(i.e, effective drift and diffusion coefficients) that characterize the
long-time behavior of the observables. Here we use diffusion maps [9] to
extract appropriate observables ("reduction coordinates") in an automated
fashion; these involve the leading eigenvectors of a weighted Laplacian on a
graph constructed from network simulation data. We present lifting and
restriction procedures for translating between physical variables and these
data-based observables. These procedures allow us to perform equation-free
coarse-grained, computations characterizing the long-term dynamics through the
design and processing of short bursts of stochastic simulation initialized at
appropriate values of the data-based observables.Comment: 26 pages, 9 figure
- …