602 research outputs found
Thermodynamic efficiency of information and heat flow
A basic task of information processing is information transfer (flow). Here
we study a pair of Brownian particles each coupled to a thermal bath at
temperature and , respectively. The information flow in such a
system is defined via the time-shifted mutual information. The information flow
nullifies at equilibrium, and its efficiency is defined as the ratio of flow
over the total entropy production in the system. For a stationary state the
information flows from higher to lower temperatures, and its the efficiency is
bound from above by . This upper bound is
imposed by the second law and it quantifies the thermodynamic cost for
information flow in the present class of systems. It can be reached in the
adiabatic situation, where the particles have widely different characteristic
times. The efficiency of heat flow|defined as the heat flow over the total
amount of dissipated heat|is limited from above by the same factor. There is a
complementarity between heat- and information-flow: the setup which is most
efficient for the former is the least efficient for the latter and {\it vice
versa}. The above bound for the efficiency can be [transiently] overcome in
certain non-stationary situations, but the efficiency is still limited from
above. We study yet another measure of information-processing [transfer
entropy] proposed in literature. Though this measure does not require any
thermodynamic cost, the information flow and transfer entropy are shown to be
intimately related for stationary states.Comment: 19 pages, 1 figur
Gauge transformations and symmetries of integrable systems
We analyze several integrable systems in zero-curvature form within the
framework of invariant gauge theory. In the Drienfeld-Sokolov gauge
we derive a two-parameter family of nonlinear evolution equations which as
special cases include the Kortweg-de Vries (KdV) and Harry Dym equations. We
find residual gauge transformations which lead to infinintesimal symmetries of
this family of equations. For KdV and Harry Dym equations we find an infinite
hierarchy of such symmetry transformations, and we investigate their relation
with local conservation laws, constants of the motion and the bi-Hamiltonian
structure of the equations. Applying successive gauge transformatinos of Miura
type we obtain a sequence of gauge equivalent integrable systems, among them
the modified KdV and Calogero KdV equations.Comment: 18pages, no figure Journal versio
Eigengene networks for studying the relationships between co-expression modules
<p>Abstract</p> <p>Background</p> <p>There is evidence that genes and their protein products are organized into functional modules according to cellular processes and pathways. Gene co-expression networks have been used to describe the relationships between gene transcripts. Ample literature exists on how to detect biologically meaningful modules in networks but there is a need for methods that allow one to study the relationships between modules.</p> <p>Results</p> <p>We show that network methods can also be used to describe the relationships between co-expression modules and present the following methodology. First, we describe several methods for detecting modules that are shared by two or more networks (referred to as consensus modules). We represent the gene expression profiles of each module by an eigengene. Second, we propose a method for constructing an eigengene network, where the edges are undirected but maintain information on the sign of the co-expression information. Third, we propose methods for differential eigengene network analysis that allow one to assess the preservation of network properties across different data sets. We illustrate the value of eigengene networks in studying the relationships between consensus modules in human and chimpanzee brains; the relationships between consensus modules in brain, muscle, liver, and adipose mouse tissues; and the relationships between male-female mouse consensus modules and clinical traits. In some applications, we find that module eigengenes can be organized into higher level clusters which we refer to as meta-modules.</p> <p>Conclusion</p> <p>Eigengene networks can be effective and biologically meaningful tools for studying the relationships between modules of a gene co-expression network. The proposed methods may reveal a higher order organization of the transcriptome. R software tutorials, the data, and supplementary material can be found at the following webpage: <url>http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/EigengeneNetwork</url>.</p
Biological Process Linkage Networks
BACKGROUND. The traditional approach to studying complex biological networks is based on the identification of interactions between internal components of signaling or metabolic pathways. By comparison, little is known about interactions between higher order biological systems, such as biological pathways and processes. We propose a methodology for gleaning patterns of interactions between biological processes by analyzing protein-protein interactions, transcriptional co-expression and genetic interactions. At the heart of the methodology are the concept of Linked Processes and the resultant network of biological processes, the Process Linkage Network (PLN). RESULTS. We construct, catalogue, and analyze different types of PLNs derived from different data sources and different species. When applied to the Gene Ontology, many of the resulting links connect processes that are distant from each other in the hierarchy, even though the connection makes eminent sense biologically. Some others, however, carry an element of surprise and may reflect mechanisms that are unique to the organism under investigation. In this aspect our method complements the link structure between processes inherent in the Gene Ontology, which by its very nature is species-independent. As a practical application of the linkage of processes we demonstrate that it can be effectively used in protein function prediction, having the power to increase both the coverage and the accuracy of predictions, when carefully integrated into prediction methods. CONCLUSIONS. Our approach constitutes a promising new direction towards understanding the higher levels of organization of the cell as a system which should help current efforts to re-engineer ontologies and improve our ability to predict which proteins are involved in specific biological processes.Lynn and William Frankel Center for Computer Science; the Paul Ivanier center for robotics research and production; National Science Foundation (ITR-048715); National Human Genome Research Institute (1R33HG002850-01A1, R01 HG003367-01A1); National Institute of Health (U54 LM008748
Detection of regulator genes and eQTLs in gene networks
Genetic differences between individuals associated to quantitative phenotypic
traits, including disease states, are usually found in non-coding genomic
regions. These genetic variants are often also associated to differences in
expression levels of nearby genes (they are "expression quantitative trait
loci" or eQTLs for short) and presumably play a gene regulatory role, affecting
the status of molecular networks of interacting genes, proteins and
metabolites. Computational systems biology approaches to reconstruct causal
gene networks from large-scale omics data have therefore become essential to
understand the structure of networks controlled by eQTLs together with other
regulatory genes, and to generate detailed hypotheses about the molecular
mechanisms that lead from genotype to phenotype. Here we review the main
analytical methods and softwares to identify eQTLs and their associated genes,
to reconstruct co-expression networks and modules, to reconstruct causal
Bayesian gene and module networks, and to validate predicted networks in
silico.Comment: minor revision with typos corrected; review article; 24 pages, 2
figure
An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++
Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license
Identifying high-impact variants and genes in exomes of Ashkenazi Jewish inflammatory bowel disease patients
Inflammatory bowel disease (IBD) is a group of chronic digestive tract inflammatory conditions whose genetic etiology is still poorly understood. The incidence of IBD is particularly high among Ashkenazi Jews. Here, we identify 8 novel and plausible IBD-causing genes from the exomes of 4453 genetically identified Ashkenazi Jewish IBD cases (1734) and controls (2719). Various biological pathway analyses are performed, along with bulk and single-cell RNA sequencing, to demonstrate the likely physiological relatedness of the novel genes to IBD. Importantly, we demonstrate that the rare and high impact genetic architecture of Ashkenazi Jewish adult IBD displays significant overlap with very early onset-IBD genetics. Moreover, by performing biobank phenome-wide analyses, we find that IBD genes have pleiotropic effects that involve other immune responses. Finally, we show that polygenic risk score analyses based on genome-wide high impact variants have high power to predict IBD susceptibility
Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources
An important problem in molecular biology is to build a complete understanding of transcriptional regulatory processes in the cell. We have developed a flexible, probabilistic framework to predict TF binding from multiple data sources that differs from the standard hypothesis testing (scanning) methods in several ways. Our probabilistic modeling framework estimates the probability of binding and, thus, naturally reflects our degree of belief in binding. Probabilistic modeling also allows for easy and systematic integration of our binding predictions into other probabilistic modeling methods, such as expression-based gene network inference. The method answers the question of whether the whole analyzed promoter has a binding site, but can also be extended to estimate the binding probability at each nucleotide position. Further, we introduce an extension to model combinatorial regulation by several TFs. Most importantly, the proposed methods can make principled probabilistic inference from multiple evidence sources, such as, multiple statistical models (motifs) of the TFs, evolutionary conservation, regulatory potential, CpG islands, nucleosome positioning, DNase hypersensitive sites, ChIP-chip binding segments and other (prior) sequence-based biological knowledge. We developed both a likelihood and a Bayesian method, where the latter is implemented with a Markov chain Monte Carlo algorithm. Results on a carefully constructed test set from the mouse genome demonstrate that principled data fusion can significantly improve the performance of TF binding prediction methods. We also applied the probabilistic modeling framework to all promoters in the mouse genome and the results indicate a sparse connectivity between transcriptional regulators and their target promoters. To facilitate analysis of other sequences and additional data, we have developed an on-line web tool, ProbTF, which implements our probabilistic TF binding prediction method using multiple data sources. Test data set, a web tool, source codes and supplementary data are available at: http://www.probtf.org
Prioritization of gene regulatory interactions from large-scale modules in yeast
<p>Abstract</p> <p>Background</p> <p>The identification of groups of co-regulated genes and their transcription factors, called transcriptional modules, has been a focus of many studies about biological systems. While methods have been developed to derive numerous modules from genome-wide data, individual links between regulatory proteins and target genes still need experimental verification. In this work, we aim to prioritize regulator-target links within transcriptional modules based on three types of large-scale data sources.</p> <p>Results</p> <p>Starting with putative transcriptional modules from ChIP-chip data, we first derive modules in which target genes show both expression and function coherence. The most reliable regulatory links between transcription factors and target genes are established by identifying intersection of target genes in coherent modules for each enriched functional category. Using a combination of genome-wide yeast data in normal growth conditions and two different reference datasets, we show that our method predicts regulatory interactions with significantly higher predictive power than ChIP-chip binding data alone. A comparison with results from other studies highlights that our approach provides a reliable and complementary set of regulatory interactions. Based on our results, we can also identify functionally interacting target genes, for instance, a group of co-regulated proteins related to cell wall synthesis. Furthermore, we report novel conserved binding sites of a glycoprotein-encoding gene, CIS3, regulated by Swi6-Swi4 and Ndd1-Fkh2-Mcm1 complexes.</p> <p>Conclusion</p> <p>We provide a simple method to prioritize individual TF-gene interactions from large-scale transcriptional modules. In comparison with other published works, we predict a complementary set of regulatory interactions which yields a similar or higher prediction accuracy at the expense of sensitivity. Therefore, our method can serve as an alternative approach to prioritization for further experimental studies.</p
- …