481 research outputs found

    SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    Get PDF
    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/

    Ortholog identification in the presence of domain architecture rearrangement

    Get PDF
    Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area

    Practical considerations for plant phylogenomics

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143756/1/aps31038_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143756/2/aps31038.pd

    New compound sets identified from high throughput phenotypic screening against three kinetoplastid parasites:an open resource

    Get PDF
    Using whole-cell phenotypic assays, the GlaxoSmithKline high-throughput screening (HTS) diversity set of 1.8 million compounds was screened against the three kinetoplastids most relevant to human disease, i.e. Leishmania donovani, Trypanosoma cruzi and Trypanosoma brucei. Secondary confirmatory and orthogonal intracellular anti-parasiticidal assays were conducted, and the potential for non-specific cytotoxicity determined. Hit compounds were chemically clustered and triaged for desirable physicochemical properties. The hypothetical biological target space covered by these diversity sets was investigated through bioinformatics methodologies. Consequently, three anti-kinetoplastid chemical boxes of ~200 compounds each were assembled. Functional analyses of these compounds suggest a wide array of potential modes of action against kinetoplastid kinases, proteases and cytochromes as well as potential host–pathogen targets. This is the first published parallel high throughput screening of a pharma compound collection against kinetoplastids. The compound sets are provided as an open resource for future lead discovery programs, and to address important research questions.The support and funding of Tres Cantos Open Lab Foundation is gratefully acknowledgedPeer reviewe

    Co-diversification of an intestinal Mycoplasma and its salmonid host

    Get PDF
    Understanding the evolutionary relationships between a host and its intestinal resident bacteria can transform how we understand adaptive phenotypic traits. The interplay between hosts and their resident bacteria inevitably affects the intestinal environment and, thereby, the living conditions of both the host and the microbiota. Thereby this co-existence likely influences the fitness of both bacteria and host. Whether this co-existence leads to evolutionary co-diversification in animals is largely unexplored, mainly due to the complexity of the environment and microbial communities and the often low host selection. We present the gut metagenome from wild Atlantic salmon (Salmo salar), a new wild organism model with an intestinal microbiota of low complexity and a well-described population structure, making it well-suited for investigating co-evolution. Our data reveal a strong host selection of a core gut microbiota dominated by a single Mycoplasma species. We found a clear co-diversification between the population structure of Atlantic salmon and nucleotide variability of the intestinal Mycoplasma populations conforming to expectations from co-evolution between host and resident bacteria. Our results show that the stable microbiota of Atlantic salmon has evolved with its salmonid host populations while potentially providing adaptive traits to the salmon host populations, including defence mechanisms, biosynthesis of essential amino acids, and metabolism of B vitamins. We highlight Atlantic salmon as a novel model for studying co-evolution between vertebrate hosts and their resident bacteria.publishedVersio

    Automatic and manual functional annotation in a distributed web service environment

    Get PDF
    While the number of genomic sequences becoming available is increasing exponentially, most genes are not functionally well characterized. Finding out more about the function of a gene and about functional relationships between genes will be the next big bottleneck in the post-genomic era. On the one hand improved pipelines and tools are needed in this context, because running experiments for all predicted genes is not feasible. On the other hand manual curation of the automatic predictions is necessary to judge the reliability of the automatic annotation and to get a more comprehensive view on the function of each individual gene. For the automatic functional annotation often a homology based function transfer from functionally characterized genes is applied using methods like Blast. However, this approach has many drawbacks and makes systematic errors by not taking care of speciation and duplication events. Phylogenomics has shown to improve the functional prediction accuracy by taking the evolutionary history of genes in a phylogenetic tree context into account. In this thesis the manual process from the assembly of the DNA sequence to the functional characterization of genes and the identification and comparison of shared syntenic regions, including the identification of candidate genes for pathogen resistance in potato chromosome V, is explained and problems discussed. To improve the automatic functional annotation in genome projects, a phylogenomic pipeline, which includes SIFTER one of the best phylogenomic tools in this area, is introduced, improved and tested in the Medicago truncatula, Sorghum bicolor and Solanum lycopersicum genome projects. To obtain new candidate genes for the development of new drugs and crop protection products, non-plant specific genes, like the transferrin family which is not known in plants yet, are extracted from the M. truncatula and S. bicolor genomes and further investigated. For further improvement of the annotation, a new phylogenomic approach is developed. This approach makes use of annotated functional attributes to calculate the functional mutation rate between genes and groups of genes in a phylogenetic tree and to find out if the function of a gene can be transferred or not. The new approach is integrated into the SIFTER tool and tested on the blue-light photoreceptor/photolyase family and on a test set of manually curated Arabidopsis thaliana genes. Using both test sets the prediction accuracy could be significantly improved and a more comprehensive view on the gene function could be obtained. But because still no tool is able to annotate all functions of a gene with 100% accuracy, I introduce a system for manual functional annotation, called AFAWE. AFAWE runs different web services for the functional annotation and displays the results and intermediate results in a comprehensive web interface that facilitates comparison. It can be used for any organism and any kind of gene. The inputs are the amino acid sequence and the corresponding organism. Because of its flexible structure, new web services and workflows can be easily integrated. Besides Blast searches against different databases and protein domain prediction tools, AFAWE also includes the phylogenomic pipeline. Different filters help to identify trustworthy results from each analysis. Furthermore a detailed manual annotation can be assigned to each protein, which will be used to update the functional annotation in public databases like MIPSPlantsDB

    Evolutionary Developmental Leaf Morphology of the Plant Family Araceae

    Get PDF
    Studying the evolutionary developmental morphology of leaves using next-generation phylogenetics, a candidate gene approach and comparative developmental studies in the plant family Araceae is the overarching theme of the dissertation. The plant family Araceae is an ancient lineage from the Early Cretaceous and belongs to the monocotyledons. Members of Araceae display striking variation in leaf development; such variation contradicts traditional models of monocot leaf development. Additionally, dissected leaves, which are rare in monocots, seem to have evolved independently multiple times in Araceae by various developmental mechanisms. Despite extensive efforts to elucidate the evolutionary history of Araceae, phylogenetic ambiguity in the backbone of the tree has precluded answering questions about the early evolution of the family. To depict the sequence of morphological and developmental modifications to leaf ontogeny over time, it is essential to have a strongly supported hypothesis of the evolutionary relationships among species in the family. To resolve the remaining questions in the deep phylogeny of Araceae a phylogenomic analysis was carried out using next-generation sequencing technology and reference-based assembly of chloroplast and mitochondrial genomes for 37 genera representing 42 of the 44 major clades in the family. Chloroplast sequences produced strongly supported phylogenies in contrast to mitochondrial sequences, which produced poorly supported trees although smaller clades were recovered. The plastid phylogeny obtained from this study is the first for Araceae with a strongly supported backbone and was used for subsequent studies of evolutionary developmental leaf morphology in the family. Studies of the genetic basis of dissected leaf morphology via blastozone fractionation in plants outside monocots have almost always implicated the action of class I KNOX (KNOX1) genes with one exception - in peas a homolog of the floral meristem gene FLO/LFY is implicated. However, studies of dissected leaf development in monocots, and an examination of the developmental genetics for those monocots that putatively share the blastozone fractionation mechanism are lacking. Two genera in Araceae, Anthurium and Amorphophallus were studied and confirmed to produce lobes and leaflets through blastozone fractionation. To test whether KNOX1 genes are involved in leaf dissection in these genera, immunolocalizations using both a full-length and C-terminus anti-KN1 antibodies were performed on histological sections of developing dissected leaves. KNOX1 protein expression detected by the full-length anti-KN1 antibody and by the C-terminus anti-KN1 antibody was absent and present in developing dissected leaves, respectively. To resolve these conflicting results, an RT-PCR assay was designed to test for the presence of KNOX1 mRNA transcripts during leaf development in Anthurium. Results of the RT-PCR assay support the KNOX1 protein expression pattern seen in immunolocalizations using the C-terminus anti-KN1 antibody. This suggests that monocots share the same genetic mechanism for dissected leaf development with other angiosperms. Historical models of leaf development posit that structural similarities between monocot and dicot leaves are the result of convergence, although this hypothesis has been contested. Araceae displays both dicot and monocot leaf characters. Previous researchers have remarked on the departure of leaf development in Araceae from traditional models of monocot leaf development. Araceae displays both dicot and monocot leaf characters. To test the hypothesis of a developmentally independent origin of dicot-like leaf characters in monocots, leaf primordium diversity was evaluated in 30 genera of Araceae, along with 36 taxa spanning the angiosperm phylogeny. Leaf primordia were scored for 14 developmental, morphological and anatomical leaf characters. Ancestral character state reconstruction was carried out using the phylogeny obtained from Chapter One, embedded in two contrasting phylogenetic hypotheses of angiosperm evolution. Taxa were plotted in morphospace constructed using the morphological matrix to test whether dicot and monocot leaves occupy similar or different parts of the morphospace. The results of ancestral character state reconstruction and morphospace plotting suggest that at the developmental morphological level, aroid and dicot leaves are homologous. However, at the molecular genetic level, a review of the literature suggests that statements of homology between monocot and dicot leaves must be tested within a framework of the hierarchically organized gene regulatory networks regulating leaf development. The leaves of Araceae have historically been considered “odd” within monocots. However, the incredible morphological and developmental diversity of leaves in Araceae has provided a powerful study system with which to investigate the unifying aspects of leaf development across angiosperms

    Developing and applying supertree methods in Phylogenomics and Macroevolution

    Get PDF
    Supertrees can be used to combine partially overalapping trees and generate more inclusive phylogenies. It has been proposed that Maximum Likelihood (ML) supertrees method (SM) could be developed using an exponential probability distribution to model errors in the input trees (given a proposed supertree). When the tree-­‐to-­‐tree distances used in the ML computation are symmetric differences, the ML SM has been shown to be equivalent to a Majority-­‐Rule consensus SM, and hence, exactly as the latter, it has the desirable property of being a median tree (with reference to the set of input trees). The ability to estimate the likelihood of supertrees, allows implementing Bayesian (MCMC) approaches, which have the advantage to allow the support for the clades in a supertree to be properly estimated. I present here the L.U.St software package; it contains the first implementation of a ML SM and allows for the first time statistical tests on supertrees. I also characterized the first implementation of the Bayesian (MCMC) SM. Both the ML and the Bayesian (MCMC) SMs have been tested for and found to be immune to biases. The Bayesian (MCMC) SM is applied to the reanalyses of a variety of datasets (i.e. the datasets for the Metazoa and the Carnivora), and I have also recovered the first Bayesian supertree-­‐based phylogeny of the Eubacteria and the Archaebacteria. These new SMs are discussed, with reference to other, well-­‐ known SMs like Matrix Representation with Parsimony. Both the ML and Bayesian SM offer multiple attractive advantages over current alternatives

    Developing and applying supertree methods in Phylogenomics and Macroevolution

    Get PDF
    Supertrees can be used to combine partially overalapping trees and generate more inclusive phylogenies. It has been proposed that Maximum Likelihood (ML) supertrees method (SM) could be developed using an exponential probability distribution to model errors in the input trees (given a proposed supertree). When the tree-­‐to-­‐tree distances used in the ML computation are symmetric differences, the ML SM has been shown to be equivalent to a Majority-­‐Rule consensus SM, and hence, exactly as the latter, it has the desirable property of being a median tree (with reference to the set of input trees). The ability to estimate the likelihood of supertrees, allows implementing Bayesian (MCMC) approaches, which have the advantage to allow the support for the clades in a supertree to be properly estimated. I present here the L.U.St software package; it contains the first implementation of a ML SM and allows for the first time statistical tests on supertrees. I also characterized the first implementation of the Bayesian (MCMC) SM. Both the ML and the Bayesian (MCMC) SMs have been tested for and found to be immune to biases. The Bayesian (MCMC) SM is applied to the reanalyses of a variety of datasets (i.e. the datasets for the Metazoa and the Carnivora), and I have also recovered the first Bayesian supertree-­‐based phylogeny of the Eubacteria and the Archaebacteria. These new SMs are discussed, with reference to other, well-­‐ known SMs like Matrix Representation with Parsimony. Both the ML and Bayesian SM offer multiple attractive advantages over current alternatives
    corecore