17,110 research outputs found

    Regulatory motif discovery using a population clustering evolutionary algorithm

    Get PDF
    This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences

    Bioinformatics tools for analysing viral genomic data

    Get PDF
    The field of viral genomics and bioinformatics is experiencing a strong resurgence due to high-throughput sequencing (HTS) technology, which enables the rapid and cost-effective sequencing and subsequent assembly of large numbers of viral genomes. In addition, the unprecedented power of HTS technologies has enabled the analysis of intra-host viral diversity and quasispecies dynamics in relation to important biological questions on viral transmission, vaccine resistance and host jumping. HTS also enables the rapid identification of both known and potentially new viruses from field and clinical samples, thus adding new tools to the fields of viral discovery and metagenomics. Bioinformatics has been central to the rise of HTS applications because new algorithms and software tools are continually needed to process and analyse the large, complex datasets generated in this rapidly evolving area. In this paper, the authors give a brief overview of the main bioinformatics tools available for viral genomic research, with a particular emphasis on HTS technologies and their main applications. They summarise the major steps in various HTS analyses, starting with quality control of raw reads and encompassing activities ranging from consensus and de novo genome assembly to variant calling and metagenomics, as well as RNA sequencing

    A rapid and scalable method for multilocus species delimitation using Bayesian model comparison and rooted triplets

    Get PDF
    Multilocus sequence data provide far greater power to resolve species limits than the single locus data typically used for broad surveys of clades. However, current statistical methods based on a multispecies coalescent framework are computationally demanding, because of the number of possible delimitations that must be compared and time-consuming likelihood calculations. New methods are therefore needed to open up the power of multilocus approaches to larger systematic surveys. Here, we present a rapid and scalable method that introduces two new innovations. First, the method reduces the complexity of likelihood calculations by decomposing the tree into rooted triplets. The distribution of topologies for a triplet across multiple loci has a uniform trinomial distribution when the 3 individuals belong to the same species, but a skewed distribution if they belong to separate species with a form that is specified by the multispecies coalescent. A Bayesian model comparison framework was developed and the best delimitation found by comparing the product of posterior probabilities of all triplets. The second innovation is a new dynamic programming algorithm for finding the optimum delimitation from all those compatible with a guide tree by successively analyzing subtrees defined by each node. This algorithm removes the need for heuristic searches used by current methods, and guarantees that the best solution is found and potentially could be used in other systematic applications. We assessed the performance of the method with simulated, published and newly generated data. Analyses of simulated data demonstrate that the combined method has favourable statistical properties and scalability with increasing sample sizes. Analyses of empirical data from both eukaryotes and prokaryotes demonstrate its potential for delimiting species in real cases

    A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data

    Get PDF
    We describe a statistical framework for reconstructing the sequence of transmission events between observed cases of an endemic infectious disease using genetic, temporal and spatial information. Previous approaches to reconstructing transmission trees have assumed all infections in the study area originated from a single introduction and that a large fraction of cases were observed. There are as yet no approaches appropriate for endemic situations in which a disease is already well established in a host population and in which there may be multiple origins of infection, or that can enumerate unobserved infections missing from the sample. Our proposed framework addresses these shortcomings, enabling reconstruction of partially observed transmission trees and estimating the number of cases missing from the sample. Analyses of simulated datasets show the method to be accurate in identifying direct transmissions, while introductions and transmissions via one or more unsampled intermediate cases could be identified at high to moderate levels of case detection. When applied to partial genome sequences of rabies virus sampled from an endemic region of South Africa, our method reveals several distinct transmission cycles with little contact between them, and direct transmission over long distances suggesting significant anthropogenic influence in the movement of infected dogs

    Intragenic homogenization and multiple copies of prey-wrapping silk genes in Argiope garden spiders.

    Get PDF
    BackgroundSpider silks are spectacular examples of phenotypic diversity arising from adaptive molecular evolution. An individual spider can produce an array of specialized silks, with the majority of constituent silk proteins encoded by members of the spidroin gene family. Spidroins are dominated by tandem repeats flanked by short, non-repetitive N- and C-terminal coding regions. The remarkable mechanical properties of spider silks have been largely attributed to the repeat sequences. However, the molecular evolutionary processes acting on spidroin terminal and repetitive regions remain unclear due to a paucity of complete gene sequences and sampling of genetic variation among individuals. To better understand spider silk evolution, we characterize a complete aciniform spidroin gene from an Argiope orb-weaving spider and survey aciniform gene fragments from congeneric individuals.ResultsWe present the complete aciniform spidroin (AcSp1) gene from the silver garden spider Argiope argentata (Aar_AcSp1), and document multiple AcSp1 loci in individual genomes of A. argentata and the congeneric A. trifasciata and A. aurantia. We find that Aar_AcSp1 repeats have >98% pairwise nucleotide identity. By comparing AcSp1 repeat amino acid sequences between Argiope species and with other genera, we identify regions of conservation over vast amounts of evolutionary time. Through a PCR survey of individual A. argentata, A. trifasciata, and A. aurantia genomes, we ascertain that AcSp1 repeats show limited variation between species whereas terminal regions are more divergent. We also find that average dN/dS across codons in the N-terminal, repetitive, and C-terminal encoding regions indicate purifying selection that is strongest in the N-terminal region.ConclusionsUsing the complete A. argentata AcSp1 gene and spidroin genetic variation between individuals, this study clarifies some of the molecular evolutionary processes underlying the spectacular mechanical attributes of aciniform silk. It is likely that intragenic concerted evolution and functional constraints on A. argentata AcSp1 repeats result in extreme repeat homogeneity. The maintenance of multiple AcSp1 encoding loci in Argiope genomes supports the hypothesis that Argiope spiders require rapid and efficient protein production to support their prolific use of aciniform silk for prey-wrapping and web-decorating. In addition, multiple gene copies may represent the early stages of spidroin diversification

    Inferring stabilizing mutations from protein phylogenies : application to influenza hemagglutinin

    Get PDF
    One selection pressure shaping sequence evolution is the requirement that a protein fold with sufficient stability to perform its biological functions. We present a conceptual framework that explains how this requirement causes the probability that a particular amino acid mutation is fixed during evolution to depend on its effect on protein stability. We mathematically formalize this framework to develop a Bayesian approach for inferring the stability effects of individual mutations from homologous protein sequences of known phylogeny. This approach is able to predict published experimentally measured mutational stability effects (ΔΔG values) with an accuracy that exceeds both a state-of-the-art physicochemical modeling program and the sequence-based consensus approach. As a further test, we use our phylogenetic inference approach to predict stabilizing mutations to influenza hemagglutinin. We introduce these mutations into a temperature-sensitive influenza virus with a defect in its hemagglutinin gene and experimentally demonstrate that some of the mutations allow the virus to grow at higher temperatures. Our work therefore describes a powerful new approach for predicting stabilizing mutations that can be successfully applied even to large, complex proteins such as hemagglutinin. This approach also makes a mathematical link between phylogenetics and experimentally measurable protein properties, potentially paving the way for more accurate analyses of molecular evolution

    Alignment and analysis of noncoding DNA sequences in Drosophila

    Get PDF
    • …
    corecore