1,384 research outputs found
Inferring transcriptional compensation interactions in yeast via stepwise structure equation modeling
<p>Abstract</p> <p>Background</p> <p>With the abundant information produced by microarray technology, various approaches have been proposed to infer transcriptional regulatory networks. However, few approaches have studied subtle and indirect interaction such as genetic compensation, the existence of which is widely recognized although its mechanism has yet to be clarified. Furthermore, when inferring gene networks most models include only observed variables whereas latent factors, such as proteins and mRNA degradation that are not measured by microarrays, do participate in networks in reality.</p> <p>Results</p> <p>Motivated by inferring transcriptional compensation (TC) interactions in yeast, a stepwise structural equation modeling algorithm (SSEM) is developed. In addition to observed variables, SSEM also incorporates hidden variables to capture interactions (or regulations) from latent factors. Simulated gene networks are used to determine with which of six possible model selection criteria (MSC) SSEM works best. SSEM with Bayesian information criterion (BIC) results in the highest true positive rates, the largest percentage of correctly predicted interactions from all existing interactions, and the highest true negative (non-existing interactions) rates. Next, we apply SSEM using real microarray data to infer TC interactions among (1) small groups of genes that are synthetic sick or lethal (SSL) to SGS1, and (2) a group of SSL pairs of 51 yeast genes involved in DNA synthesis and repair that are of interest. For (1), SSEM with BIC is shown to outperform three Bayesian network algorithms and a multivariate autoregressive model, checked against the results of qRT-PCR experiments. The predictions for (2) are shown to coincide with several known pathways of Sgs1 and its partners that are involved in DNA replication, recombination and repair. In addition, experimentally testable interactions of Rad27 are predicted.</p> <p>Conclusion</p> <p>SSEM is a useful tool for inferring genetic networks, and the results reinforce the possibility of predicting pathways of protein complexes via genetic interactions.</p
The Discrete Infinite Logistic Normal Distribution
We present the discrete infinite logistic normal distribution (DILN), a
Bayesian nonparametric prior for mixed membership models. DILN is a
generalization of the hierarchical Dirichlet process (HDP) that models
correlation structure between the weights of the atoms at the group level. We
derive a representation of DILN as a normalized collection of gamma-distributed
random variables, and study its statistical properties. We consider
applications to topic modeling and derive a variational inference algorithm for
approximate posterior inference. We study the empirical performance of the DILN
topic model on four corpora, comparing performance with the HDP and the
correlated topic model (CTM). To deal with large-scale data sets, we also
develop an online inference algorithm for DILN and compare with online HDP and
online LDA on the Nature magazine, which contains approximately 350,000
articles.Comment: This paper will appear in Bayesian Analysis. A shorter version of
this paper appeared at AISTATS 2011, Fort Lauderdale, FL, US
An innovative approach for testing bioinformatics programs using metamorphic testing
Background: Recent advances in experimental and computational technologies have fueled the development of many sophisticated bioinformatics programs. The correctness of such programs is crucial as incorrectly computed results may lead to wrong biological conclusion or misguide downstream experimentation. Common software testing procedures involve executing the target program with a set of test inputs and then verifying the correctness of the test outputs. However, due to the complexity of many bioinformatics programs, it is often difficult to verify the correctness of the test outputs. Therefore our ability to perform systematic software testing is greatly hindered
Statistical modeling of RNA structure profiling experiments enables parsimonious reconstruction of structure landscapes.
RNA plays key regulatory roles in diverse cellular processes, where its functionality often derives from folding into and converting between structures. Many RNAs further rely on co-existence of alternative structures, which govern their response to cellular signals. However, characterizing heterogeneous landscapes is difficult, both experimentally and computationally. Recently, structure profiling experiments have emerged as powerful and affordable structure characterization methods, which improve computational structure prediction. To date, efforts have centered on predicting one optimal structure, with much less progress made on multiple-structure prediction. Here, we report a probabilistic modeling approach that predicts a parsimonious set of co-existing structures and estimates their abundances from structure profiling data. We demonstrate robust landscape reconstruction and quantitative insights into structural dynamics by analyzing numerous data sets. This work establishes a framework for data-directed characterization of structure landscapes to aid experimentalists in performing structure-function studies
Evolutionary games on graphs
Game theory is one of the key paradigms behind many scientific disciplines
from biology to behavioral sciences to economics. In its evolutionary form and
especially when the interacting agents are linked in a specific social network
the underlying solution concepts and methods are very similar to those applied
in non-equilibrium statistical physics. This review gives a tutorial-type
overview of the field for physicists. The first three sections introduce the
necessary background in classical and evolutionary game theory from the basic
definitions to the most important results. The fourth section surveys the
topological complications implied by non-mean-field-type social network
structures in general. The last three sections discuss in detail the dynamic
behavior of three prominent classes of models: the Prisoner's Dilemma, the
Rock-Scissors-Paper game, and Competing Associations. The major theme of the
review is in what sense and how the graph structure of interactions can modify
and enrich the picture of long term behavioral patterns emerging in
evolutionary games.Comment: Review, final version, 133 pages, 65 figure
Practical Approaches to Biological Network Discovery
This dissertation addresses a current outstanding problem in the field of systems biology, which is to identify the structure of a transcriptional network from high-throughput experimental data. Understanding of the connectivity of a transcriptional network is an important piece of the puzzle, which relates the genotype of an organism to its phenotypes. An overwhelming number of computational approaches have been proposed to perform integrative analyses on large collections of high-throughput gene expression datasets to infer the structure of transcriptional networks. I put forth a methodology by which these tools can be evaluated and compared against one another to better understand their strengths and weaknesses. Next I undertake the task of utilizing high-throughput datasets to learn new and interesting network biology in the pathogenic fungus Cryptococcus neoformans. Finally I propose a novel computational method for mapping out transcriptional networks that unifies two orthogonal strategies for network inference. I apply this method to map out the transcriptional network of Saccharomyces cerevisiae and demonstrate how network inference results can complement chromatin immunoprecipitation: ChIP) experiments, which directly probe the binding events of transcriptional regulators. Collectively, my contributions improve both the accessibility and practicality of network inference methods
Minimal models of evolution: germline fitness effects of cancer mutations and stochastic tunneling under strong recombination
In a time where data on the genetic make-up of organisms is available in abundance, the theory of evolution is of immediate importance to answer key questions of biology: How can one explain the variation seen in the DNA of different organisms and species? What are the effects of changes in the DNA on the function of cells? What are the driving mechanisms of diseases with a genetic component such as cancer? Minimal mathematical models of evolution provide a basis for the interpretation of DNA data. The explanations they offer are concrete and testable, their assumptions and limitations explicit. The application and further development of minimal evolution models is the main theme of this work. In the first part, the functional effects of mutations found in cancer cells are analyzed from
the perspective of germline evolution. This is the process that produced the DNA of organisms as we see it today. Mutations have an effect on the fitness of healthy cells. This impact can be estimated from the variation seen in the sequences of protein domains. It is found that this evolutionarily informed conservation score has utility to identify cancer driver genes, especially if they are tumor suppressor genes. The relevance of this fitness scale for cancer mutations is demonstrated on a data set of mutations in protein kinase genes. This analysis is followed by
an application of Hidden Markov Models (HMM) to the detection of signals of positive selection in cancer mutation data. Cancer as an evolutionary process of
cells is markedly different from the process of germline evolution. Cancer-specific selection can be seen in genes, whose activity or lack thereof is essential for the
progress of cancer. These cancer genes exhibit an increased rate of amino acid changing mutations, beyond the level expected by chance. The identification of these genes is a statistical task for which HMM are shown to be most suitable. Finally, an extended mathematical model of evolution is analyzed which describes the adaptation of a sexually reproducing population to a global fitness maximum
via compensatory mutations. In a two-locus/two-allele model, the compound effects of mutation, selection, genetic drift, recombination and sign epistasis lead to the interesting situation of adaption via the crossing of a fitness valley in genotype space. This bottleneck can be overcome by rare large fluctuations in the allele frequencies overcoming the effect of recombinatorial reshuffling. The relevant time scales are derived for a parameter regime that includes large recombination
Sentinel: A Hyper-Heuristic for the Generation of Mutant Reduction Strategies
Mutation testing is an effective approach to evaluate and strengthen software
test suites, but its adoption is currently limited by the mutants' execution
computational cost. Several strategies have been proposed to reduce this cost
(a.k.a. mutation cost reduction strategies), however none of them has proven to
be effective for all scenarios since they often need an ad-hoc manual selection
and configuration depending on the software under test (SUT). In this paper, we
propose a novel multi-objective evolutionary hyper-heuristic approach, dubbed
Sentinel, to automate the generation of optimal cost reduction strategies for
every new SUT. We evaluate Sentinel by carrying out a thorough empirical study
involving 40 releases of 10 open-source real-world software systems and both
baseline and state-of-the-art strategies as a benchmark. We execute a total of
4,800 experiments, and evaluate their results with both quality indicators and
statistical significance tests, following the most recent best practice in the
literature. The results show that strategies generated by Sentinel outperform
the baseline strategies in 95% of the cases always with large effect sizes.
They also obtain statistically significantly better results than
state-of-the-art strategies in 88% of the cases, with large effect sizes for
95% of them. Also, our study reveals that the mutation strategies generated by
Sentinel for a given software version can be used without any loss in quality
for subsequently developed versions in 95% of the cases. These results show
that Sentinel is able to automatically generate mutation strategies that reduce
mutation testing cost without affecting its testing effectiveness (i.e.
mutation score), thus taking off from the tester's shoulders the burden of
manually selecting and configuring strategies for each SUT.Comment: in IEEE Transactions on Software Engineerin
- …