171 research outputs found

    FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

    Get PDF
    Background Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA\u27s on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10Γ— speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs)

    A Bioinformatics Classifier and Database for Heme-Copper Oxygen Reductases

    Get PDF
    Background: Heme-copper oxygen reductases (HCOs) are the last enzymatic complexes of most aerobic respiratory chains, reducing dioxygen to water and translocating up to four protons across the inner mitochondrial membrane (eukaryotes) or cytoplasmatic membrane (prokaryotes). The number of completely sequenced genomes is expanding exponentially, and concomitantly, the number and taxonomic distribution of HCO sequences. These enzymes were initially classified into three different types being this classification recently challenged. Methodology:We reanalyzed the classification scheme and developed a new bioinformatics classifier for the HCO and Nitric oxide reductases (NOR), which we benchmark against a manually derived gold standard sequence set. It is able to classify any given sequence of subunit I from HCO and NOR with a global recall and precision both of 99.8%. We use this tool to classify this protein family in 552 completely sequenced genomes. Conclusions: We concluded that the new and broader data set supports three functional and evolutionary groups of HCOs. Homology between NORs and HCOs is shown and NORs closest relationship with C Type HCOs demonstrated. We established and made available a classification web tool and an integrated Heme-Copper Oxygen reductase and NOR protein database (www.evocell.org/hco)

    The evolutionary history of the catenin gene family during metazoan evolution

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Catenin is a gene family composed of three subfamilies; p120, beta and alpha. Beta and p120 are homologous subfamilies based on sequence and structural comparisons, and are members of the armadillo repeat protein superfamily. Alpha does not appear to be homologous to either beta or p120 based on the lack of sequence and structural similarity, and the alpha subfamily belongs to the vinculin superfamily. Catenins link the transmembrane protein cadherin to the cytoskeleton and thus function in cell-cell adhesion. To date, only the beta subfamily has been evolutionarily analyzed and experimentally studied for its functions in signaling pathways, development and human diseases such as cancer. We present a detailed evolutionary study of the whole catenin family to provide a better understanding of how this family has evolved in metazoans, and by extension, the evolution of cell-cell adhesion.</p> <p>Results</p> <p>All three catenin subfamilies have been detected in metazoans used in the present study by searching public databases and applying species-specific BLAST searches. Two monophyletic clades are formed between beta and p120 subfamilies using Bayesian phylogenetic inference. Phylogenetic analyses also reveal an array of duplication events throughout metazoan history. Furthermore, numerous annotation issues for the catenin family have been detected by our computational analyses.</p> <p>Conclusions</p> <p>Delta2/ARVCF catenin in the p120 subfamily, beta catenin in the beta subfamily, and alpha2 catenin in the alpha subfamily are present in all metazoans analyzed. This implies that the last common ancestor of metazoans had these three catenin subfamilies. However, not all members within each subfamily were detected in all metazoan species. Each subfamily has undergone duplications at different levels (species-specific, subphylum-specific or phylum-specific) and to different extents (in the case of the number of homologs). Extensive annotation problems have been resolved in each of the three catenin subfamilies. This resolution provides a more coherent description of catenin evolution.</p

    The Molecular Evolution of the p120-Catenin Subfamily and Its Functional Associations

    Get PDF
    p120-catenin (p120) is the prototypical member of a subclass of armadillo-related proteins that includes Ξ΄-catenin/NPRAP, ARVCF, p0071, and the more distantly related plakophilins 1–3. In vertebrates, p120 is essential in regulating surface expression and stability of all classical cadherins, and directly interacts with Kaiso, a BTB/ZF family transcription factor.To clarify functional relationships between these proteins and how they relate to the classical cadherins, we have examined the proteomes of 14 diverse vertebrate and metazoan species. The data reveal a single ancient Ξ΄-catenin-like p120 family member present in the earliest metazoans and conserved throughout metazoan evolution. This single p120 family protein is present in all protostomes, and in certain early-branching chordate lineages. Phylogenetic analyses suggest that gene duplication and functional diversification into β€œp120-like” and β€œΞ΄-catenin-like” proteins occurred in the urochordate-vertebrate ancestor. Additional gene duplications during early vertebrate evolution gave rise to the seven vertebrate p120 family members. Kaiso family members (i.e., Kaiso, ZBTB38 and ZBTB4) are found only in vertebrates, their origin following that of the p120-like gene lineage and coinciding with the evolution of vertebrate-specific mechanisms of epigenetic gene regulation by CpG island methylation.The p120 protein family evolved from a common Ξ΄-catenin-like ancestor present in all metazoans. Through several rounds of gene duplication and diversification, however, p120 evolved in vertebrates into an essential, ubiquitously expressed protein, whereas loss of the more selectively expressed Ξ΄-catenin, p0071 and ARVCF are tolerated in most species. Together with phylogenetic studies of the vertebrate cadherins, our data suggest that the p120-like and Ξ΄-catenin-like genes co-evolved separately with non-neural (E- and P-cadherin) and neural (N- and R-cadherin) cadherin lineages, respectively. The expansion of p120 relative to Ξ΄-catenin during vertebrate evolution may reflect the pivotal and largely disproportionate role of the non-neural cadherins with respect to evolution of the wide range of somatic morphology present in vertebrates today

    Comparative Analysis of Gene Content Evolution in Phytoplasmas and Mycoplasmas

    Get PDF
    Phytoplasmas and mycoplasmas are two groups of important pathogens in the bacterial class Mollicutes. Because of their economical and clinical importance, these obligate pathogens have attracted much research attention. However, difficulties involved in the empirical study of these bacteria, particularly the fact that phytoplasmas have not yet been successfully cultivated outside of their hosts despite decades of attempts, have greatly hampered research progress. With the rapid advancements in genome sequencing, comparative genome analysis provides a new approach to facilitate our understanding of these bacteria. In this study, our main focus is to investigate the evolution of gene content in phytoplasmas, mycoplasmas, and their common ancestor. By using a phylogenetic framework for comparative analysis of 12 complete genome sequences, we characterized the putative gains and losses of genes in these obligate parasites. Our results demonstrated that the degradation of metabolic capacities in these bacteria has occurred predominantly in the common ancestor of Mollicutes, prior to the evolutionary split of phytoplasmas and mycoplasmas. Furthermore, we identified a list of genes that are acquired by the common ancestor of phytoplasmas and are conserved across all strains with complete genome sequences available. These genes include several putative effectors for the interactions with hosts and may be good candidates for future functional characterization

    Environmental DNA sequencing primers for eutardigrades and bdelloid rotifers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The time it takes to isolate individuals from environmental samples and then extract DNA from each individual is one of the problems with generating molecular data from meiofauna such as eutardigrades and bdelloid rotifers. The lack of consistent morphological information and the extreme abundance of these classes makes morphological identification of rare, or even common cryptic taxa a large and unwieldy task. This limits the ability to perform large-scale surveys of the diversity of these organisms.</p> <p>Here we demonstrate a culture-independent molecular survey approach that enables the generation of large amounts of eutardigrade and bdelloid rotifer sequence data directly from soil. Our PCR primers, specific to the 18s small-subunit rRNA gene, were developed for both eutardigrades and bdelloid rotifers.</p> <p>Results</p> <p>The developed primers successfully amplified DNA of their target organism from various soil DNA extracts. This was confirmed by both the BLAST similarity searches and phylogenetic analyses. Tardigrades showed much better phylogenetic resolution than bdelloids. Both groups of organisms exhibited varying levels of endemism.</p> <p>Conclusion</p> <p>The development of clade-specific primers for characterizing eutardigrades and bdelloid rotifers from environmental samples should greatly increase our ability to characterize the composition of these taxa in environmental samples. Environmental sequencing as shown here differs from other molecular survey methods in that there is no need to pre-isolate the organisms of interest from soil in order to amplify their DNA. The DNA sequences obtained from methods that do not require culturing can be identified post-hoc and placed phylogenetically as additional closely related sequences are obtained from morphologically identified conspecifics. Our non-cultured environmental sequence based approach will be able to provide a rapid and large-scale screening of the presence, absence and diversity of Bdelloidea and Eutardigrada in a variety of soils.</p

    Complete Genome Viral Phylogenies Suggests the Concerted Evolution of Regulatory Cores and Accessory Satellites

    Get PDF
    We consider the concerted evolution of viral genomes in four families of DNA viruses. Given the high rate of horizontal gene transfer among viruses and their hosts, it is an open question as to how representative particular genes are of the evolutionary history of the complete genome. To address the concerted evolution of viral genes, we compared genomic evolution across four distinct, extant viral families. For all four viral families we constructed DNA-dependent DNA polymerase-based (DdDp) phylogenies and in addition, whole genome sequence, as quantitative descriptions of inter-genome relationships. We found that the history of the polymerase gene was highly predictive of the history of the genome as a whole, which we explain in terms of repeated, co-divergence events of the core DdDp gene accompanied by a number of satellite, accessory genetic loci. We also found that the rate of gene gain in baculovirus and poxviruses proceeds significantly more quickly than the rate of gene loss and that there is convergent acquisition of satellite functions promoting contextual adaptation when distinct viral families infect related hosts. The congruence of the genome and polymerase trees suggests that a large set of viral genes, including polymerase, derive from a phylogenetically conserved core of genes of host origin, secondarily reinforced by gene acquisition from common hosts or co-infecting viruses within the host. A single viral genome can be thought of as a mutualistic network, with the core genes acting as an effective host and the satellite genes as effective symbionts. Larger virus genomes show a greater departure from linkage equilibrium between core and satellites functions

    Basal Jawed Vertebrate Phylogenomics Using Transcriptomic Data from Solexa Sequencing

    Get PDF
    The traditionally accepted relationships among basal jawed vertebrates have been challenged by some molecular phylogenetic analyses based on mitochondrial sequences. Those studies split extant gnathostomes into two monophyletic groups: tetrapods and piscine branch, including Chondrichthyes, Actinopterygii and sarcopterygian fishes. Lungfish and bichir are found in a basal position on the piscine branch. Based on transcriptomes of an armored bichir (Polypterus delhezi) and an African lungfish (Protopterus sp.) we generated, expressed sequences and whole genome sequences available from public databases, we obtained 111 genes to reconstruct the phylogenetic tree of basal jawed vertebrates and estimated their times of divergence. Our phylogenomic study supports the traditional relationship. We found that gnathostomes are divided into Chondrichthyes and the Osteichthyes, both with 100% support values (posterior probabilities and bootstrap values). Chimaeras were found to have a basal position among cartilaginous fishes with a 100% support value. Osteichthyes were divided into Actinopterygii and Sarcopterygii with 100% support value. Lungfish and tetrapods form a monophyletic group with 100% posterior probability. Bichir and two teleost species form a monophyletic group with 100% support value. The previous tree, based on mitochondrial data, was significantly rejected by an approximately unbiased test (AU test, pβ€Š=β€Š0). The time of divergence between lungfish and tetrapods was estimated to be 391.8 Ma and the divergence of bichir from pufferfish and medaka was estimated to be 330.6 Ma. These estimates closely match the fossil record. In conclusion, our phylogenomic study successfully resolved the relationship of basal jawed vertebrates based on transtriptomes, EST and whole genome sequences

    A Computational Study of Elongation Factor G (EFG) Duplicated Genes: Diverged Nature Underlying the Innovation on the Same Structural Template

    Get PDF
    BACKGROUND: Elongation factor G (EFG) is a core translational protein that catalyzes the elongation and recycling phases of translation. A more complex picture of EFG's evolution and function than previously accepted is emerging from analyzes of heterogeneous EFG family members. Whereas the gene duplication is postulated to be a prominent factor creating functional novelty, the striking divergence between EFG paralogs can be interpreted in terms of innovation in gene function. METHODOLOGY/PRINCIPAL FINDINGS: We present a computational study of the EFG protein family to cover the role of gene duplication in the evolution of protein function. Using phylogenetic methods, genome context conservation and insertion/deletion (indel) analysis we demonstrate that the EFG gene copies form four subfamilies: EFG I, spdEFG1, spdEFG2, and EFG II. These ancient gene families differ by their indispensability, degree of divergence and number of indels. We show the distribution of EFG subfamilies and describe evidences for lateral gene transfer and recent duplications. Extended studies of the EFG II subfamily concern its diverged nature. Remarkably, EFG II appears to be a widely distributed and a much-diversified subfamily whose subdivisions correlate with phylum or class borders. The EFG II subfamily specific characteristics are low conservation of the GTPase domain, domains II and III; absence of the trGTPase specific G2 consensus motif "RGITI"; and twelve conserved positions common to the whole subfamily. The EFG II specific functional changes could be related to changes in the properties of nucleotide binding and hydrolysis and strengthened ionic interactions between EFG II and the ribosome, particularly between parts of the decoding site and loop I of domain IV. CONCLUSIONS/SIGNIFICANCE: Our work, for the first time, comprehensively identifies and describes EFG subfamilies and improves our understanding of the function and evolution of EFG duplicated genes
    • …
    corecore