6,282 research outputs found

    Multiple testing for SNP-SNP interactions

    Get PDF
    Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SNPs. In the last few years, this topic has often been addressed in terms of SNP-SNP interaction patterns given as expressions linked by logical operators. Methods for multiple testing in high-dimensional settings can be applied when many SNPs are considered simultaneously. However, another less well-known multiple testing problem arises within a fixed subset of SNPs when the logic expression is chosen optimally. In this article, we propose a general asymptotic approach for deriving the distribution of the maximally selected chi-square statistic in various situations. We show how this result can be used for testing logic expressions - in particular SNP-SNP interaction patterns - while controlling for multiple comparisons. Simulations show that our method provides multiple testing adjustment when the logic expression is chosen such as to maximize the statistic. Its benefit is demonstrated through an application to a real dataset from a large population-based study considering allergy and asthma in KORA. An implementation of our method is available from the Comprehensive R Archive Network (CRAN) as R package 'SNPmaxsel'

    The genetic basis for adaptation of model-designed syntrophic co-cultures.

    Get PDF
    Understanding the fundamental characteristics of microbial communities could have far reaching implications for human health and applied biotechnology. Despite this, much is still unknown regarding the genetic basis and evolutionary strategies underlying the formation of viable synthetic communities. By pairing auxotrophic mutants in co-culture, it has been demonstrated that viable nascent E. coli communities can be established where the mutant strains are metabolically coupled. A novel algorithm, OptAux, was constructed to design 61 unique multi-knockout E. coli auxotrophic strains that require significant metabolite uptake to grow. These predicted knockouts included a diverse set of novel non-specific auxotrophs that result from inhibition of major biosynthetic subsystems. Three OptAux predicted non-specific auxotrophic strains-with diverse metabolic deficiencies-were co-cultured with an L-histidine auxotroph and optimized via adaptive laboratory evolution (ALE). Time-course sequencing revealed the genetic changes employed by each strain to achieve higher community growth rates and provided insight into mechanisms for adapting to the syntrophic niche. A community model of metabolism and gene expression was utilized to predict the relative community composition and fundamental characteristics of the evolved communities. This work presents new insight into the genetic strategies underlying viable nascent community formation and a cutting-edge computational method to elucidate metabolic changes that empower the creation of cooperative communities

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Constraints on Patterns of Abundance and Aggregation in Biological Systems

    Get PDF
    Understanding the mechanisms that structure biological systems is a primary goal of biology. My research shows that the biological structure is constrained in important ways by general variables such as the number of base pairs in a genome and the number of individuals and species in a community. I used a combination of macroecology, bioinformatics, statistics, mathematics, and advanced computing to pursue my research and published several peer-reviewed scientific manuscripts and open-source software as a result.I was funded through a combination of fellowships and scholarships awarded by the Utah State University School of Graduate Studies, College of Science, and Department of Biology, as well as teaching assistantships awarded through the Department of Biology at Utah State University, and research assistantships funded through a CAREER grant from the U.S. National Science Foundation (DEB-0953694) awarded to my advisor, Dr. Ethan White. With the help of my advisor, I also obtained a computing grant from Amazon Web Services in the amount of 7,500.Altogether,fundingformyresearchandeducationtotaledapproximately7,500. Altogether, funding for my research and education totaled approximately 123,500. Using over 9000 communities of plants, animals, fungi, and microorganisms, I demonstrated that the forms of empirical species abundance distributions (SADs) are constrained by total abundance and species richness. Using over 300 microbial genomes, I demonstrate that nucleotide aggregation is constrained by genome length and differs between regions of coding and noncoding DNA. General state variables of genomes and ecological communities (i.e. genome length, total abundance and species richness) constrain simple structural properties of each system

    Efficient Two-Stage Group Testing Algorithms for Genetic Screening

    Full text link
    Efficient two-stage group testing algorithms that are particularly suited for rapid and less-expensive DNA library screening and other large scale biological group testing efforts are investigated in this paper. The main focus is on novel combinatorial constructions in order to minimize the number of individual tests at the second stage of a two-stage disjunctive testing procedure. Building on recent work by Levenshtein (2003) and Tonchev (2008), several new infinite classes of such combinatorial designs are presented.Comment: 14 pages; to appear in "Algorithmica". Part of this work has been presented at the ICALP 2011 Group Testing Workshop; arXiv:1106.368

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore