53 research outputs found

    Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Continuing research into the global multiple sequence alignment problem has resulted in more sophisticated and principled alignment methods. Unfortunately these new algorithms often require large amounts of time and memory to run, making it nearly impossible to run these algorithms on large datasets. As a solution, we present two general methods, Crumble and Prune, for breaking a phylogenetic alignment problem into smaller, more tractable sub-problems. We call Crumble and Prune <it>meta-alignment </it>methods because they use existing alignment algorithms and can be used with many current alignment programs. Crumble breaks long alignment problems into shorter sub-problems. Prune divides the phylogenetic tree into a collection of smaller trees to reduce the number of sequences in each alignment problem. These methods are orthogonal: they can be applied together to provide better scaling in terms of sequence length and in sequence depth. Both methods partition the problem such that many of the sub-problems can be solved independently. The results are then combined to form a solution to the full alignment problem.</p> <p>Results</p> <p>Crumble and Prune each provide a significant performance improvement with little loss of accuracy. In some cases, a gain in accuracy was observed. Crumble and Prune were tested on real and simulated data. Furthermore, we have implemented a system called Job-tree that allows hierarchical sub-problems to be solved in parallel on a compute cluster, significantly shortening the run-time.</p> <p>Conclusions</p> <p>These methods enabled us to solve gigabase alignment problems. These methods could enable a new generation of biologically realistic alignment algorithms to be applied to real world, large scale alignment problems.</p

    Batrachochytrium dendrobatidis Shows High Genetic Diversity and Ecological Niche Specificity among Haplotypes in the Maya Mountains of Belize

    Get PDF
    The amphibian pathogen Batrachochytrium dendrobatidis (Bd) has been implicated in amphibian declines around the globe. Although it has been found in most countries in Central America, its presence has never been assessed in Belize. We set out to determine the range, prevalence, and diversity of Bd using quantitative PCR (qPCR) and sequencing of a portion of the 5.8 s and ITS1-2 regions. Swabs were collected from 524 amphibians of at least 26 species in the protected areas of the Maya Mountains of Belize. We sequenced a subset of 72 samples that had tested positive for Bd by qPCR at least once; 30 samples were verified as Bd. Eight unique Bd haplotypes were identified in the Maya Mountains, five of which were previously undescribed. We identified unique ecological niches for the two most broadly distributed haplotypes. Combined with data showing differing virulence shown in different strains in other studies, the 5.8 s - ITS1-2 region diversity found in this study suggests that there may be substantial differences among populations or haplotypes. Future work should focus on whether specific haplotypes for other genomic regions and possibly pathogenicity can be associated with haplotypes at this locus, as well as the integration of molecular tools with other ecological tools to elucidate the ecology and pathogenicity of Bd

    The Deadly Chytrid Fungus: A Story of an Emerging Pathogen

    Get PDF
    [Extract] Emerging infectious diseases present a great challenge for the health of both humans and wildlife. The increasing prevalence of drug-resistant fungal pathogens in humans [1] and recent outbreaks of novel fungal pathogens in wildlife populations [2] underscore the need to better understand the origins and mechanisms of fungal pathogenicity. One of the most dramatic examples of fungal impacts on vertebrate populations is the effect of the amphibian disease chytridiomycosis, caused by the chytrid fungus Batrachochytrium dendrobatidis (Bd).\ud Amphibians around the world are experiencing unprecedented population losses and local extinctions [3]. While there are multiple causes of amphibian declines, many catastrophic die-offs are attributed to Bd [4],[5]. The chytrid pathogen has been documented in hundreds of amphibian species, and reports of Bd's impact on additional species and in additional geographic regions are accumulating at an alarming rate (e.g., see http://www.spatialepidemiology.net/bd). Bd is a microbial, aquatic fungus with distinct life stages. The motile stage, called a zoospore, swims using a flagellum and initiates the colonization of frog skin. Within the host epidermal cells, a zoospore forms a spherical thallus, which matures and produces new zoospores by dividing asexually, renewing the cycle of infection when zoospores are released to the skin surface (Figure 1). Bd is considered an emerging pathogen, discovered and described only a decade ago [6],[7]. Despite intensive ecological study of Bd over the last decade, a number of unanswered questions remain. Here we summarize what has been recently learned about this lethal pathogen

    A Comparison of Phylogenetic Network Methods Using Computer Simulation

    Get PDF
    Background: We present a series of simulation studies that explore the relative performance of several phylogenetic network approaches (statistical parsimony, split decomposition, union of maximum parsimony trees, neighbor-net, simulated history recombination upper bound, median-joining, reduced median joining and minimum spanning network) compared to standard tree approaches, (neighbor-joining and maximum parsimony) in the presence and absence of recombination. Principal Findings: In the absence of recombination, all methods recovered the correct topology and branch lengths nearly all of the time when the substitution rate was low, except for minimum spanning networks, which did considerably worse. At a higher substitution rate, maximum parsimony and union of maximum parsimony trees were the most accurate. With recombination, the ability to infer the correct topology was halved for all methods and no method could accurately estimate branch lengths. Conclusions: Our results highlight the need for more accurate phylogenetic network methods and the importance of detecting and accounting for recombination in phylogenetic studies. Furthermore, we provide useful information for choosing a network algorithm and a framework in which to evaluate improvements to existing methods and nove

    Characterization of killer immunoglobulin-like receptor genetics and comprehensive genotyping by pyrosequencing in rhesus macaques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Human killer immunoglobulin-like receptors (KIRs) play a critical role in governing the immune response to neoplastic and infectious disease. Rhesus macaques serve as important animal models for many human diseases in which KIRs are implicated; however, the study of KIR activity in this model is hindered by incomplete characterization of <it>KIR </it>genetics.</p> <p>Results</p> <p>Here we present a characterization of <it>KIR </it>genetics in rhesus macaques (<it>Macaca mulatta)</it>. We conducted a survey of <it>KIRs </it>in this species, identifying 47 novel full-length <it>KIR </it>sequences. Using this expanded sequence library to build upon previous work, we present evidence supporting the existence of 22 <it>Mamu-KIR </it>genes, providing a framework within which to describe macaque <it>KIRs</it>. We also developed a novel pyrosequencing-based technique for <it>KIR </it>genotyping. This method provides both comprehensive <it>KIR </it>genotype and frequency estimates of transcript level, with implications for the study of <it>KIRs </it>in all species.</p> <p>Conclusions</p> <p>The results of this study significantly improve our understanding of macaque <it>KIR </it>genetic organization and diversity, with implications for the study of many human diseases that use macaques as a model. The ability to obtain comprehensive KIR genotypes is of basic importance for the study of KIRs, and can easily be adapted to other species. Together these findings both advance the field of macaque KIRs and facilitate future research into the role of KIRs in human disease.</p

    A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes

    Get PDF
    Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6–7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing

    Positive Selection in East Asians for an EDAR Allele that Enhances NF-ΞΊB Activation

    Get PDF
    Genome-wide scans for positive selection in humans provide a promising approach to establish links between genetic variants and adaptive phenotypes. From this approach, lists of hundreds of candidate genomic regions for positive selection have been assembled. These candidate regions are expected to contain variants that contribute to adaptive phenotypes, but few of these regions have been associated with phenotypic effects. Here we present evidence that a derived nonsynonymous substitution (370A) in EDAR, a gene involved in ectodermal development, was driven to high frequency in East Asia by positive selection prior to 10,000 years ago. With an in vitro transfection assay, we demonstrate that 370A enhances NF-ΞΊB activity. Our results suggest that 370A is a positively selected functional genetic variant that underlies an adaptive human phenotype

    C-type lectin-like domains in Fugu rubripes

    Get PDF
    BACKGROUND: Members of the C-type lectin domain (CTLD) superfamily are metazoan proteins functionally important in glycoprotein metabolism, mechanisms of multicellular integration and immunity. Three genome-level studies on human, C. elegans and D. melanogaster reported previously demonstrated almost complete divergence among invertebrate and mammalian families of CTLD-containing proteins (CTLDcps). RESULTS: We have performed an analysis of CTLD family composition in Fugu rubripes using the draft genome sequence. The results show that all but two groups of CTLDcps identified in mammals are also found in fish, and that most of the groups have the same members as in mammals. We failed to detect representatives for CTLD groups V (NK cell receptors) and VII (lithostathine), while the DC-SIGN subgroup of group II is overrepresented in Fugu. Several new CTLD-containing genes, highly conserved between Fugu and human, were discovered using the Fugu genome sequence as a reference, including a CSPG family member and an SCP-domain-containing soluble protein. A distinct group of soluble dual-CTLD proteins has been identified, which may be the first reported CTLDcp group shared by invertebrates and vertebrates. We show that CTLDcp-encoding genes are selectively duplicated in Fugu, in a manner that suggests an ancient large-scale duplication event. We have verified 32 gene structures and predicted 63 new ones, and make our annotations available through a distributed annotation system (DAS) server and their sequences as additional files with this paper. CONCLUSIONS: The vertebrate CTLDcp family was essentially formed early in vertebrate evolution and is completely different from the invertebrate families. Comparison of fish and mammalian genomes revealed three groups of CTLDcps and several new members of the known groups, which are highly conserved between fish and mammals, but were not identified in the study using only mammalian genomes. Despite limitations of the draft sequence, the Fugu rubripes genome is a powerful instrument for gene discovery and vertebrate evolutionary analysis. The composition of the CTLDcp superfamily in fish and mammals suggests that large-scale duplication events played an important role in the evolution of vertebrates

    Analysis of the Basidiomycete Coprinopsis cinerea Reveals Conservation of the Core Meiotic Expression Program over Half a Billion Years of Evolution

    Get PDF
    Coprinopsis cinerea (also known as Coprinus cinereus) is a multicellular basidiomycete mushroom particularly suited to the study of meiosis due to its synchronous meiotic development and prolonged prophase. We examined the 15-hour meiotic transcriptional program of C. cinerea, encompassing time points prior to haploid nuclear fusion though tetrad formation, using a 70-mer oligonucleotide microarray. As with other organisms, a large proportion (∼20%) of genes are differentially regulated during this developmental process, with successive waves of transcription apparent in nine transcriptional clusters, including one enriched for meiotic functions. C. cinerea and the fungi Saccharomyces cerevisiae and Schizosaccharomyces pombe diverged ∼500–900 million years ago, permitting a comparison of transcriptional programs across a broad evolutionary time scale. Previous studies of S. cerevisiae and S. pombe compared genes that were induced upon entry into meiosis; inclusion of C. cinerea data indicates that meiotic genes are more conserved in their patterns of induction across species than genes not known to be meiotic. In addition, we found that meiotic genes are significantly more conserved in their transcript profiles than genes not known to be meiotic, which indicates a remarkable conservation of the meiotic process across evolutionarily distant organisms. Overall, meiotic function genes are more conserved in both induction and transcript profile than genes not known to be meiotic. However, of 50 meiotic function genes that were co-induced in all three species, 41 transcript profiles were well-correlated in at least two of the three species, but only a single gene (rad50) exhibited coordinated induction and well-correlated transcript profiles in all three species, indicating that co-induction does not necessarily predict correlated expression or vice versa. Differences may reflect differences in meiotic mechanisms or new roles for paralogs. Similarities in induction, transcript profiles, or both, should contribute to gene discovery for orthologs without currently characterized meiotic roles

    Whole Genome Resequencing Reveals Natural Target Site Preferences of Transposable Elements in Drosophila melanogaster

    Get PDF
    Transposable elements are mobile DNA sequences that integrate into host genomes using diverse mechanisms with varying degrees of target site specificity. While the target site preferences of some engineered transposable elements are well studied, the natural target preferences of most transposable elements are poorly characterized. Using population genomic resequencing data from 166 strains of Drosophila melanogaster, we identified over 8,000 new insertion sites not present in the reference genome sequence that we used to decode the natural target preferences of 22 families of transposable element in this species. We found that terminal inverted repeat transposon and long terminal repeat retrotransposon families present clade-specific target site duplications and target site sequence motifs. Additionally, we found that the sequence motifs at transposable element target sites are always palindromes that extend beyond the target site duplication. Our results demonstrate the utility of population genomics data for high-throughput inference of transposable element targeting preferences in the wild and establish general rules for terminal inverted repeat transposon and long terminal repeat retrotransposon target site selection in eukaryotic genomes
    • …
    corecore