6,399 research outputs found
Multiple testing for SNP-SNP interactions
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered simultaneously. However, another less well-known multiple testing problem arises within a fixed subset of SNPs when the logic expression is chosen optimally. In this article, we propose a general asymptotic approach for deriving the distribution of the maximally selected chi-square statistic in various situations. We show how this result can be used for testing logic expressions - in particular SNP-SNP interaction patterns - while controlling for multiple comparisons. Simulations show that our method provides multiple testing adjustment when the logic expression is chosen such as to maximize the statistic. Its benefit is demonstrated through an application to a real
dataset from a large population-based study considering allergy and asthma in KORA. An implementation of our method is available from the Comprehensive R Archive Network (CRAN) as R package 'SNPmaxsel'
The genetic basis for adaptation of model-designed syntrophic co-cultures.
Understanding the fundamental characteristics of microbial communities could have far reaching implications for human health and applied biotechnology. Despite this, much is still unknown regarding the genetic basis and evolutionary strategies underlying the formation of viable synthetic communities. By pairing auxotrophic mutants in co-culture, it has been demonstrated that viable nascent E. coli communities can be established where the mutant strains are metabolically coupled. A novel algorithm, OptAux, was constructed to design 61 unique multi-knockout E. coli auxotrophic strains that require significant metabolite uptake to grow. These predicted knockouts included a diverse set of novel non-specific auxotrophs that result from inhibition of major biosynthetic subsystems. Three OptAux predicted non-specific auxotrophic strains-with diverse metabolic deficiencies-were co-cultured with an L-histidine auxotroph and optimized via adaptive laboratory evolution (ALE). Time-course sequencing revealed the genetic changes employed by each strain to achieve higher community growth rates and provided insight into mechanisms for adapting to the syntrophic niche. A community model of metabolism and gene expression was utilized to predict the relative community composition and fundamental characteristics of the evolved communities. This work presents new insight into the genetic strategies underlying viable nascent community formation and a cutting-edge computational method to elucidate metabolic changes that empower the creation of cooperative communities
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Constraints on Patterns of Abundance and Aggregation in Biological Systems
Understanding the mechanisms that structure biological systems is a primary goal of biology. My research shows that the biological structure is constrained in important ways by general variables such as the number of base pairs in a genome and the number of individuals and species in a community. I used a combination of macroecology, bioinformatics, statistics, mathematics, and advanced computing to pursue my research and published several peer-reviewed scientific manuscripts and open-source software as a result.I was funded through a combination of fellowships and scholarships awarded by the Utah State University School of Graduate Studies, College of Science, and Department of Biology, as well as teaching assistantships awarded through the Department of Biology at Utah State University, and research assistantships funded through a CAREER grant from the U.S. National Science Foundation (DEB-0953694) awarded to my advisor, Dr. Ethan White. With the help of my advisor, I also obtained a computing grant from Amazon Web Services in the amount of 123,500.
Using over 9000 communities of plants, animals, fungi, and microorganisms, I demonstrated that the forms of empirical species abundance distributions (SADs) are constrained by total abundance and species richness. Using over 300 microbial genomes, I demonstrate that nucleotide aggregation is constrained by genome length and differs between regions of coding and noncoding DNA. General state variables of genomes and ecological communities (i.e. genome length, total abundance and species richness) constrain simple structural properties of each system
Efficient Two-Stage Group Testing Algorithms for Genetic Screening
Efficient two-stage group testing algorithms that are particularly suited for
rapid and less-expensive DNA library screening and other large scale biological
group testing efforts are investigated in this paper. The main focus is on
novel combinatorial constructions in order to minimize the number of individual
tests at the second stage of a two-stage disjunctive testing procedure.
Building on recent work by Levenshtein (2003) and Tonchev (2008), several new
infinite classes of such combinatorial designs are presented.Comment: 14 pages; to appear in "Algorithmica". Part of this work has been
presented at the ICALP 2011 Group Testing Workshop; arXiv:1106.368
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …