31,216 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Consensus clustering approach to group brain connectivity matrices
A novel approach rooted on the notion of consensus clustering, a strategy
developed for community detection in complex networks, is proposed to cope with
the heterogeneity that characterizes connectivity matrices in health and
disease. The method can be summarized as follows:
(i) define, for each node, a distance matrix for the set of subjects by
comparing the connectivity pattern of that node in all pairs of subjects; (ii)
cluster the distance matrix for each node; (iii) build the consensus network
from the corresponding partitions; (iv) extract groups of subjects by finding
the communities of the consensus network thus obtained.
Differently from the previous implementations of consensus clustering, we
thus propose to use the consensus strategy to combine the information arising
from the connectivity patterns of each node. The proposed approach may be seen
either as an exploratory technique or as an unsupervised pre-training step to
help the subsequent construction of a supervised classifier. Applications on a
toy model and two real data sets, show the effectiveness of the proposed
methodology, which represents heterogeneity of a set of subjects in terms of a
weighted network, the consensus matrix
Networks and the epidemiology of infectious disease
The science of networks has revolutionised research into the dynamics of interacting elements. It could be argued that epidemiology in particular has embraced the potential of network theory more than any other discipline. Here we review the growing body of research concerning the spread of infectious diseases on networks, focusing on the interplay between network theory and epidemiology. The review is split into four main sections, which examine: the types of network relevant to epidemiology; the multitude of ways these networks can be characterised; the statistical methods that can be applied to infer the epidemiological parameters on a realised network; and finally simulation and analytical methods to determine epidemic dynamics on a given network. Given the breadth of areas covered and the ever-expanding number of publications, a comprehensive review of all work is impossible. Instead, we provide a personalised overview into the areas of network epidemiology that have seen the greatest progress in recent years or have the greatest potential to provide novel insights. As such, considerable importance is placed on analytical approaches and statistical methods which are both rapidly expanding fields. Throughout this review we restrict our attention to epidemiological issues
Estimating sample-specific regulatory networks
Biological systems are driven by intricate interactions among the complex
array of molecules that comprise the cell. Many methods have been developed to
reconstruct network models of those interactions. These methods often draw on
large numbers of samples with measured gene expression profiles to infer
connections between genes (or gene products). The result is an aggregate
network model representing a single estimate for the likelihood of each
interaction, or "edge," in the network. While informative, aggregate models
fail to capture the heterogeneity that is represented in any population. Here
we propose a method to reverse engineer sample-specific networks from aggregate
network models. We demonstrate the accuracy and applicability of our approach
in several data sets, including simulated data, microarray expression data from
synchronized yeast cells, and RNA-seq data collected from human lymphoblastoid
cell lines. We show that these sample-specific networks can be used to study
changes in network topology across time and to characterize shifts in gene
regulation that may not be apparent in expression data. We believe the ability
to generate sample-specific networks will greatly facilitate the application of
network methods to the increasingly large, complex, and heterogeneous
multi-omic data sets that are currently being generated, and ultimately support
the emerging field of precision network medicine
A statistical network analysis of the HIV/AIDS epidemics in Cuba
The Cuban contact-tracing detection system set up in 1986 allowed the
reconstruction and analysis of the sexual network underlying the epidemic
(5,389 vertices and 4,073 edges, giant component of 2,386 nodes and 3,168
edges), shedding light onto the spread of HIV and the role of contact-tracing.
Clustering based on modularity optimization provides a better visualization and
understanding of the network, in combination with the study of covariates. The
graph has a globally low but heterogeneous density, with clusters of high
intraconnectivity but low interconnectivity. Though descriptive, our results
pave the way for incorporating structure when studying stochastic SIR epidemics
spreading on social networks
Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data
We present the extention and application of a new unsupervised statistical
learning technique--the Partition Decoupling Method--to gene expression data.
Because it has the ability to reveal non-linear and non-convex geometries
present in the data, the PDM is an improvement over typical gene expression
analysis algorithms, permitting a multi-gene analysis that can reveal
phenotypic differences even when the individual genes do not exhibit
differential expression. Here, we apply the PDM to publicly-available gene
expression data sets, and demonstrate that we are able to identify cell types
and treatments with higher accuracy than is obtained through other approaches.
By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may
be used to find sets of mechanistically-related genes that discriminate
phenotypes.Comment: Revise
- …