25 research outputs found

    Beginner's guide on the use of PAML to detect positive selection

    Get PDF
    The CODEML program in the PAML package has been widely used to analyze protein-coding gene sequences to estimate the synonymous and nonsynonymous rates (dS and dN) and to detect positive Darwinian selection driving protein evolution. For users not familiar with molecular evolutionary analysis, the program is known to have a steep learning curve. Here, we provide a step-by-step protocol to illustrate the commonly used tests available in the program, including the branch models, the site models, and the branch-site models, which can be used to detect positive selection driving adaptive protein evolution affecting particular lineages of the species phylogeny, affecting a subset of amino acid residues in the protein, and affecting a subset of sites along prespecified lineages, respectively. A data set of the myxovirus (Mx) genes from ten mammal and two bird species is used as an example. We discuss a new feature in CODEML that allows users to perform positive selection tests for multiple genes for the same set of taxa, as is common in modern genome-sequencing projects. The PAML package is distributed at https://github.com/abacus-gene/paml under the GNU license, with support provided at its discussion site (https://groups.google.com/g/pamlsoftware). Data files used in this protocol are available at https://github.com/abacus-gene/paml-tutorial

    VSpipe, an Integrated Resource for Virtual Screening and Hit Selection: Applications to Protein Tyrosine Phospahatase Inhibition

    Get PDF
    The use of computational tools for virtual screening provides a cost-efficient approach to select starting points for drug development. We have developed VSpipe, a user-friendly semi-automated pipeline for structure-based virtual screening. VSpipe uses the existing tools AutoDock and OpenBabel together with software developed in-house, to create an end-to-end virtual screening workflow ranging from the preparation of receptor and ligands to the visualisation of results. VSpipe is efficient and flexible, allowing the users to make choices at different steps, and it is amenable to use in both local and cluster mode. We have validated VSpipe using the human protein tyrosine phosphatase PTP1B as a case study. Using a combination of blind and targeted docking VSpipe identified both new and known functional ligand binding sites. Assessment of different binding clusters using the ligand efficiency plots created by VSpipe, defined a drug-like chemical space for development of PTP1B inhibitors with potential applications to other PTPs. In this study, we show that VSpipe can be deployed to identify and compare different modes of inhibition thus guiding the selection of initial hits for drug discovery

    Identification of Functional and Druggable Sites in Aspergillus fumigatus Essential Phosphatases by Virtual Screening

    Get PDF
    Fungal diseases are a serious health burden worldwide with drug resistance compromising efficacy of the limited arsenal of antifungals available. New drugs with novel mechanisms of action are desperately needed to overcome current challenges. The screening of the Aspergillus fumigatus genome identified 35 phosphatases, four of which were previously reported as essential for viability. In addition, we validated another three essential phosphatases. Phosphatases control critical events in fungi from cell wall integrity to cell cycle, thus they are attractive targets for drug development. We used VSpipe v1.0, a virtual screening pipeline, to evaluate the druggability of the seven essential phosphatases and identify starting points for drug discovery. Targeted virtual screening and evaluation of the ligand efficiency plots created by VSpipe, enabled us to define the most favourable chemical space for drug development and suggested different modes of inhibition for each phosphatase. Interestingly, the identified ligand binding sites match with functional sites (active site and protein interaction sites) reported for other yeast and human homologues. Thus, the VSpipe virtual screening approach identified both druggable and functional sites in these essential phosphatases for further experimental validation and antifungal drug development

    Dire wolves were the last of an ancient New World canid lineage

    Get PDF
    Dire wolves are considered to be one of the most common and widespread large carnivores in Pleistocene America1, yet relatively little is known about their evolution or extinction. Here, to reconstruct the evolutionary history of dire wolves, we sequenced five genomes from sub-fossil remains dating from 13,000 to more than 50,000 years ago. Our results indicate that although they were similar morphologically to the extant grey wolf, dire wolves were a highly divergent lineage that split from living canids around 5.7 million years ago. In contrast to numerous examples of hybridization across Canidae2,3, there is no evidence for gene flow between dire wolves and either North American grey wolves or coyotes. This suggests that dire wolves evolved in isolation from the Pleistocene ancestors of these species. Our results also support an early New World origin of dire wolves, while the ancestors of grey wolves, coyotes and dholes evolved in Eurasia and colonized North America only relatively recently

    BACTpipe : Characterization of bacterial isolates based on whole-genome sequence data

    No full text
    The technological advances have led to faster and more cost-effective sequencing platforms, making it quicker and more affordable to generate genomic sequence data. For the study of bacterial genome, two main methods can be used, whole-genome sequencing and metagenomic shotgun sequencing, of which the first is the mostly used in the past years. As a consequence of these advances, a vast amount of data is currently available and the need of bioinformatics tools to efficiently analyse and interpret it has dramatically increased. At present, there is a great quantity of tools to use in each step of bacterial genome characterization: (1) pre-processing, (2) de novo assembly, (3) annotation, and (4) taxonomic and functional comparisons. Therefore, it is difficult to decide which tools are better to use and the analysis is slowed down when changing from one tool to another. In order to tackle this, the pipeline BACTpipe was developed. This pipeline concatenates both bioinformatics tools selected based on a previous testing and additional scripts to perform the whole bacterial analysis at once. The most relevant output generated by BACTpipe are the annotated de novo assembled genomes, the newick file containing the phylogenetic relationships between species, and the gene presence-absence matrix, which the users can then filter according to their interests. After testing BACTpipe with a set of bacterial whole-genome sequence data, 60 genes out of the 18195 found in all the Lactobacillus species analysed were classified as core genes, i.e. genes shared among all these species. Housekeeping genes or genes involved in the replication, transcription, or translation processes were identifie

    Supplementary data

    No full text
    This zip file contains (i) the raw data from the carnivoran data set, (ii) the alignment used with morphology-only and molecule-only partitions, (iii) the control file used for the divergence times estimation with the combined data set, (iv) the tree files used to calculate Bayes factors and estimate divergence times, and (v) a README.md file with the details about the data. In addition, each data file contains also a description of its format at the end

    Data from: Bayesian estimation of species divergence times using correlated quantitative characters

    No full text
    Discrete morphological data have been widely used to study species evolution, but the use of quantitative (or continuous) morphological characters is less common. Here, we implement a Bayesian method to estimate species divergence times using quantitative characters. Quantitative character evolution is modelled using Brownian diffusion with character correlation and character variation within populations. Through simulations, we demonstrate that ignoring the population variation (or population “noise”) and the correlation among characters leads to biased estimates of divergence times and rate, especially if the correlation and population noise are high. We apply our new method to the analysis of quantitative characters (cranium landmarks) and molecular data from carnivoran mammals. Our results show that time estimates are affected by whether the correlations and population noise are accounted for or ignored in the analysis. The estimates are also affected by the type of data analysed, with analyses of morphological characters only, molecular data only, or a combination of both; showing noticeable differences among the time estimates. Rate variation of morphological characters among the carnivoran species appears to be very high, with Bayesian model selection indicating that the independent-rates model fits the morphological data better than the autocorrelated-rates model. We suggest that using morphological continuous characters, together with molecular data, can bring a new perspective to the study of species evolution. Our new model is implemented in the MCMCtree computer program for Bayesian inference of divergence times

    Supplementary Tables

    No full text
    Supplementary tables S1-S7 corresponding to the performance measures for the estimated nodes and rate for the simulated data sets. Table S1 contains the performance measures when assessing the effect of sample size, Table S2 for the effect of fossil age, Table S3 for the effect of low population noise (c=0.25), Table S4 for the effect of high population noise (c=0.5), Table S5 for the effect of low population noise (rho = 0.5), and Table S6 for the effect of high population noise (rho = 0.9). Tables S7A-E contain the performance measures for different correlation values ranging from 0 to 0.5 and from 0.5 to 0.9. These measures were only used to better explore the results in Fig. 7C and 7C', which could not be properly understood when only plotting the estimated parameters when rho = 0, 0.5, and 0.9

    Supplementary Figures

    No full text
    Supplementary figures S1-S5 with the corresponding captions

    Data from: An evaluation of different partitioning strategies for Bayesian estimation of species divergence times

    No full text
    The explosive growth of molecular sequence data has made it possible to estimate species divergence times under relaxed-clock models using genome-scale datasets with many gene loci. In order both to improve model realism and to best extract information about relative divergence times in the sequence data, it is important to account for the heterogeneity in the evolutionary process across genes or genomic regions. Partitioning is a commonly used approach to achieve those goals. We group sites that have similar evolutionary characteristics into the same partition and those with different characteristics into different partitions, and then use different models or different values of model parameters for different partitions to account for the among-partition heterogeneity. However, how to partition data in practical phylogenetic analysis, and in particular in relaxed-clock dating analysis, is more art than science. Here, we use computer simulation and real data analysis to study the impact of the partition scheme on divergence time estimation. The partition schemes had relatively minor effects on the accuracy of posterior time estimates when the prior assumptions were correct and the clock was not seriously violated, but showed large differences when the clock was seriously violated, when the fossil calibrations were in conflict or incorrect, or when the rate prior was mis-specified. Concatenation produced the widest posterior intervals with the least precision. Use of many partitions increased the precision, as predicted by the infinite-sites theory, but the posterior intervals might fail to include the true ages because of the conflicting fossil calibrations or mis-specified rate priors. We analyzed a dataset of 78 plastid genes from 15 plant species with serious clock violation and showed that time estimates differed significantly among partition schemes, irrespective of the rate drift model used. Multiple and precise fossil calibrations reduced the differences among partition schemes and were important to improving the precision of divergence time estimates. While the use of many partitions is an important approach to reducing the uncertainty in posterior time estimates, we do not recommend its general use for the present, given the limitations of current models of rate drift for partitioned data and the challenges of interpreting the fossil evidence to construct accurate and informative calibrations
    corecore