89 research outputs found
A Differential Network Approach to Exploring Differences between Biological States: An Application to Prediabetes
Background: Variations in the pattern of molecular associations are observed during disease development. The comprehensive analysis of molecular association patterns and their changes in relation to different physiological conditions can yield insight into the biological basis of disease-specific phenotype variation. Methodology: Here, we introduce a formal statistical method for the differential analysis of molecular associations via network representation. We illustrate our approach with extensive data on lipoprotein subclasses measured by NMR spectroscopy in 4,406 individuals with normal fasting glucose, and 531 subjects with impaired fasting glucose (prediabetes). We estimate the pair-wise association between measures using shrinkage estimates of partial correlations and build the differential network based on this measure of association. We explore the topological properties of the inferred network to gain insight into important metabolic differences between individuals with normal fasting glucose and prediabetes. Conclusions/Significance: Differential networks provide new insights characterizing differences in biological states. Based on conventional statistical methods, few differences in concentration levels of lipoprotein subclasses were found between individuals with normal fasting glucose and individuals with prediabetes. By performing the differential analysis of networks, several characteristic changes in lipoprotein metabolism known to be related to diabetic dyslipidemias were identified. The results demonstrate the applicability of the new approach to identify key molecular changes inaccessible to standard approaches
Simple connectome inference from partial correlation statistics in calcium imaging
In this work, we propose a simple yet effective solution to the problem of
connectome inference in calcium imaging data. The proposed algorithm consists
of two steps. First, processing the raw signals to detect neural peak
activities. Second, inferring the degree of association between neurons from
partial correlation statistics. This paper summarises the methodology that led
us to win the Connectomics Challenge, proposes a simplified version of our
method, and finally compares our results with respect to other inference
methods
Arabidopsis thaliana computationally-generated next-state gene interaction models
The construction of gene interaction models must be a fully collaborative and
intentional effort. All aspects of the research, such as growing the plants, extracting the measurements,
refining the measured data, developing the statistical framework, and forming and
applying the algorithmic techniques, must lend themselves to repeatable and sound practices.
This paper holistically focuses on the process of producing gene interaction models based on
transcript abundance data from Arabidopsis thaliana after stimulation by a plant hormone
Application of new probabilistic graphical models in the genetic regulatory networks studies
This paper introduces two new probabilistic graphical models for
reconstruction of genetic regulatory networks using DNA microarray data. One is
an Independence Graph (IG) model with either a forward or a backward search
algorithm and the other one is a Gaussian Network (GN) model with a novel
greedy search method. The performances of both models were evaluated on four
MAPK pathways in yeast and three simulated data sets. Generally, an IG model
provides a sparse graph but a GN model produces a dense graph where more
information about gene-gene interactions is preserved. Additionally, we found
two key limitations in the prediction of genetic regulatory networks using DNA
microarray data, the first is the sufficiency of sample size and the second is
the complexity of network structures may not be captured without additional
data at the protein level. Those limitations are present in all prediction
methods which used only DNA microarray data.Comment: 38 pages, 3 figure
Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.National Science Foundation (U.S.). Graduate Research Fellowship ProgramNational Defense Science and Engineering Graduate Fellowshi
Identifying a Transcription Factor’s Regulatory Targets from its Binding Targets
ChIP-chip data, which shows binding of transcription factors (TFs) to promoter regions in vivo, are widely used by biologists to identify the regulatory targets of TFs. However, the binding of a TF to a gene does not necessarily imply regulation. Thus, it is important to develop computational methods which can extract a TF’s regulatory targets from its binding targets. We developed a method, called REgulatory Targets Extraction Algorithm (RETEA), which uses partial correlation analysis on gene expression data to extract a TF’s regulatory targets from its binding targets inferred from ChIP-chip data. We applied RETEA to yeast cell cycle microarray data and identified the plausible regulatory targets of eleven known cell cycle TFs. We validated our predictions by checking the enrichments for cell cycle-regulated genes, common cellular processes and common molecular functions. Finally, we showed that RETEA performs better than three published methods (MA-Network, TRIA and Garten et al’s method)
Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions
Improvements in sequencing technologies and reduced experimental costs have
resulted in a vast number of studies generating high-throughput data. Although
the number of methods to analyze these "omics" data has also increased,
computational complexity and lack of documentation hinder researchers from
analyzing their high-throughput data to its true potential. In this chapter we
detail our data-driven, transkingdom network (TransNet) analysis protocol to
integrate and interrogate multi-omics data. This systems biology approach has
allowed us to successfully identify important causal relationships between
different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of
data
From Knockouts to Networks: Establishing Direct Cause-Effect Relationships through Graph Analysis
Background: Reverse-engineering gene networks from expression profiles is a difficult problem for which a multitude of techniques have been developed over the last decade. The yearly organized DREAM challenges allow for a fair evaluation and unbiased comparison of these methods. Results: We propose an inference algorithm that combines confidence matrices, computed as the standard scores from single-gene knockout data, with the down-ranking of feed-forward edges. Substantial improvements on the predictions can be obtained after the execution of this second step. Conclusions: Our algorithm was awarded the best overall performance at the DREAM4 In Silico 100-gene network subchallenge, proving to be effective in inferring medium-size gene regulatory networks. This success demonstrates once again the decisive importance of gene expression data obtained after systematic gene perturbations and highlights the usefulness of graph analysis to increase the reliability of inference
- …