15,476 research outputs found
Defining a robust biological prior from Pathway Analysis to drive Network Inference
Inferring genetic networks from gene expression data is one of the most
challenging work in the post-genomic era, partly due to the vast space of
possible networks and the relatively small amount of data available. In this
field, Gaussian Graphical Model (GGM) provides a convenient framework for the
discovery of biological networks. In this paper, we propose an original
approach for inferring gene regulation networks using a robust biological prior
on their structure in order to limit the set of candidate networks.
Pathways, that represent biological knowledge on the regulatory networks,
will be used as an informative prior knowledge to drive Network Inference. This
approach is based on the selection of a relevant set of genes, called the
"molecular signature", associated with a condition of interest (for instance,
the genes involved in disease development). In this context, differential
expression analysis is a well established strategy. However outcome signatures
are often not consistent and show little overlap between studies. Thus, we will
dedicate the first part of our work to the improvement of the standard process
of biomarker identification to guarantee the robustness and reproducibility of
the molecular signature.
Our approach enables to compare the networks inferred between two conditions
of interest (for instance case and control networks) and help along the
biological interpretation of results. Thus it allows to identify differential
regulations that occur in these conditions. We illustrate the proposed approach
by applying our method to a study of breast cancer's response to treatment
Recommended from our members
Bayesian Inference for Genomic Data Analysis
High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data.
Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters.
Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Recommended from our members
Understand Biology Using Single Cell RNA-Sequencing
This dissertation summarizes the development of experimental and analytical tools for single cell RNA sequencing (scRNA-Seq), including 1) scPLATE-Seq, a FACS- and plate-based scRNASeq platform, which is accurate, robust, fully automated and cost-efficient; 2) metaVIPER, an algorithm for transcriptional regulator activity inference based on scRNA-Seq profiles; and 3) iterClust, a statistical framework for iterative clustering analysis, especially suitable for dissecting hierarchy of heterogeneity among single cells. Further this dissertation summarizes biological questions answered by combining these tools, including 1) understanding inter- and intra-tumor heterogeneity of human glioblastoma; 2) elucidating regulators of β-cell de-differentiation in type-2 diabetes; and 3) developing novel therapeutics targeting cell-state regulators of breast cancer stem cells
Interactome comparison of human embryonic stem cell lines with the inner cell mass and trophectoderm
Networks of interacting co-regulated genes distinguish the inner cell mass (ICM) from the
differentiated trophectoderm (TE) in the preimplantation blastocyst, in a species specific manner. In mouse the ground state pluripotency of the ICM appears to be maintained in murine embryonic stem cells (ESCs) derived from the ICM. This is not the case for human ESCs. In order to gain insight into this phenomenon, we have used quantitative network analysis to identify how similar human (h)ESCs are to the human ICM. Using the hESC lines MAN1, HUES3 and HUES7 we have shown that all have only a limited overlap with ICM specific gene expression, but that this overlap is enriched for network
properties that correspond to key aspects of function including transcription factor activity and the hierarchy of network modules. These analyses provide an important framework which highlights the developmental origins of hESCs
Influence of Statistical Estimators of Mutual Information and Data Heterogeneity on the Inference of Gene Regulatory Networks
The inference of gene regulatory networks from gene expression data is a difficult problem because the performance of the inference algorithms depends on a multitude of different factors. In this paper we study two of these. First, we investigate the influence of discrete mutual information (MI) estimators on the global and local network inference performance of the C3NET algorithm. More precisely, we study different MI estimators (Empirical, Miller-Madow, Shrink and Schürmann-Grassberger) in combination with discretization methods (equal frequency, equal width and global equal width discretization). We observe the best global and local inference performance of C3NET for the Miller-Madow estimator with an equal width discretization. Second, our numerical analysis can be considered as a systems approach because we simulate gene expression data from an underlying gene regulatory network, instead of making a distributional assumption to sample thereof. We demonstrate that despite the popularity of the latter approach, which is the traditional way of studying MI estimators, this is in fact not supported by simulated and biological expression data because of their heterogeneity. Hence, our study provides guidance for an efficient design of a simulation study in the context of network inference, supporting a systems approach
Data based identification and prediction of nonlinear and complex dynamical systems
We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
- …