179 research outputs found
Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership
High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an “interesting” set of genes – say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover “gold standard” annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available
Implementing Arithmetic and Other Analytic Operations By Transcriptional Regulation
The transcriptional regulatory machinery of a gene can be viewed as a computational device, with transcription factor concentrations as inputs and expression level as the output. This view begs the question: what kinds of computations are possible? We show that different parameterizations of a simple chemical kinetic model of transcriptional regulation are able to approximate all four standard arithmetic operations: addition, subtraction, multiplication, and division, as well as various equality and inequality operations. This contrasts with other studies that emphasize logical or digital notions of computation in biological networks. We analyze the accuracy and precision of these approximations, showing that they depend on different sets of parameters, and are thus independently tunable. We demonstrate that networks of these “arithmetic” genes can be combined to accomplish yet more complicated computations by designing and simulating a network that detects statistically significant elevations in a time-varying signal. We also consider the much more general problem of approximating analytic functions, showing that this can be achieved by allowing multiple transcription factor binding sites on the promoter. These observations are important for the interpretation of naturally occurring networks and imply new possibilities for the design of synthetic networks
Voter Model Perturbations and Reaction Diffusion Equations
We consider particle systems that are perturbations of the voter model and
show that when space and time are rescaled the system converges to a solution
of a reaction diffusion equation in dimensions . Combining this result
with properties of the PDE, some methods arising from a low density
super-Brownian limit theorem, and a block construction, we give general, and
often asymptotically sharp, conditions for the existence of non-trivial
stationary distributions, and for extinction of one type. As applications, we
describe the phase diagrams of three systems when the parameters are close to
the voter model: (i) a stochastic spatial Lotka-Volterra model of Neuhauser and
Pacala, (ii) a model of the evolution of cooperation of Ohtsuki, Hauert,
Lieberman, and Nowak, and (iii) a continuous time version of the non-linear
voter model of Molofsky, Durrett, Dushoff, Griffeath, and Levin. The first
application confirms a conjecture of Cox and Perkins and the second confirms a
conjecture of Ohtsuki et al in the context of certain infinite graphs. An
important feature of our general results is that they do not require the
process to be attractive.Comment: 106 pages, 7 figure
Voter Model Perturbations and Reaction Diffusion Equations
We consider particle systems that are perturbations of the voter model and show that when space and time are rescaled the system converges to a solution of a reaction diffusion equation in dimensions d \u3e 3. Combining this result with properties of the PDE, some methods arising from a low density super-Brownian limit theorem, and a block construction, we give general, and often asymptotically sharp, conditions for the existence of non-trivial stationary distributions, and for extinction of one type. As applications, we describe the phase diagrams of three systems when the parameters are close to the voter model: (i) a stochastic spatial Lotka-Volterra model of Neuhauser and Pacala, (ii) a model of the evolution of cooperation of Ohtsuki, Hauert, Lieberman, and Nowak, and (iii) a continuous time version of the non-linear voter model of Molofsky, Durrett, Dushoff, Griffeath, and Levin. The first application confirms a conjecture of Cox and Perkins and the second confirms a conjecture of Ohtsuki et al in the context of certain infinite graphs. An important feature of our general results is that they do not require the process to be attractive
A General Model of Codon Bias Due to GC Mutational Bias
Background - In spite of extensive research on the effect of mutation and selection on codon usage, a general model of codon usage bias due to mutational bias has been lacking. Because most amino acids allow synonymous GC content changing substitutions in the third codon position, the overall GC bias of a genome or genomic region is highly correlated with GC3, a measure of third position GC content. For individual amino acids as well, G/C ending codons usage generally increases with increasing GC bias and decreases with increasing AT bias. Arginine and leucine, amino acids that allow GC-changing synonymous substitutions in the first and third codon positions, have codons which may be expected to show different usage patterns. // Principal Findings - In analyzing codon usage bias in hundreds of prokaryotic and plant genomes and in human genes, we find that two G-ending codons, AGG (arginine) and TTG (leucine), unlike all other G/C-ending codons, show overall usage that decreases with increasing GC bias, contrary to the usual expectation that G/C-ending codon usage should increase with increasing genomic GC bias. Moreover, the usage of some codons appears nonlinear, even nonmonotone, as a function of GC bias. To explain these observations, we propose a continuous-time Markov chain model of GC-biased synonymous substitution. This model correctly predicts the qualitative usage patterns of all codons, including nonlinear codon usage in isoleucine, arginine and leucine. The model accounts for 72%, 64% and 52% of the observed variability of codon usage in prokaryotes, plants and human respectively. When codons are grouped based on common GC content, 87%, 80% and 68% of the variation in usage is explained for prokaryotes, plants and human respectively. // Conclusions - The model clarifies the sometimes-counterintuitive effects that GC mutational bias can have on codon usage, quantifies the influence of GC mutational bias and provides a natural null model relative to which other influences on codon bias may be measured
Reverse Engineering the Gap Gene Network of Drosophila melanogaster
A fundamental problem in functional genomics is to determine the structure and dynamics of genetic networks based on expression data. We describe a new strategy for solving this problem and apply it to recently published data on early Drosophila melanogaster development. Our method is orders of magnitude faster than current fitting methods and allows us to fit different types of rules for expressing regulatory relationships. Specifically, we use our approach to fit models using a smooth nonlinear formalism for modeling gene regulation (gene circuits) as well as models using logical rules based on activation and repression thresholds for transcription factors. Our technique also allows us to infer regulatory relationships de novo or to test network structures suggested by the literature. We fit a series of models to test several outstanding questions about gap gene regulation, including regulation of and by hunchback and the role of autoactivation. Based on our modeling results and validation against the experimental literature, we propose a revised network structure for the gap gene system. Interestingly, some relationships in standard textbook models of gap gene regulation appear to be unnecessary for or even inconsistent with the details of gap gene expression during wild-type development
- …