28 research outputs found
Shortest triplet clustering: reconstructing large phylogenies using representative sets
BACKGROUND: Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. RESULTS: We propose a new distance-based clustering method, the shortest triplet clustering algorithm (STC), to reconstruct phylogenies. The main idea is the introduction of a natural definition of so-called k-representative sets. Based on k-representative sets, shortest triplets are reconstructed and serve as building blocks for the STC algorithm to agglomerate sequences for tree reconstruction in O(n(2)) time for n sequences. Simulations show that STC gives better topological accuracy than other tested methods that also build a first starting tree. STC appears as a very good method to start the tree reconstruction. However, all tested methods give similar results if balanced nearest neighbor interchange (BNNI) is applied as a post-processing step. BNNI leads to an improvement in all instances. The program is available at . CONCLUSION: The results demonstrate that the new approach efficiently reconstructs phylogenies for large data sets. We found that BNNI boosts the topological accuracy of all methods including STC, therefore, one should use BNNI as a post-processing step to get better topological accuracy
Improved mitochondrial amino acid substitution models for metazoan evolutionary studies
Abstract Background Amino acid substitution models play an essential role in inferring phylogenies from mitochondrial protein data. However, only few empirical models have been estimated from restricted mitochondrial protein data of a hundred species. The existing models are unlikely to represent appropriately the amino acid substitutions from hundred thousands metazoan mitochondrial protein sequences. Results We selected 125,935 mitochondrial protein sequences from 34,448 species in the metazoan kingdom to estimate new amino acid substitution models targeting metazoa, vertebrates and invertebrate groups. The new models help to find significantly better likelihood phylogenies in comparison with the existing models. We noted remarkable distances from phylogenies with the existing models to the maximum likelihood phylogenies that indicate a considerable number of incorrect bipartitions in phylogenies with the existing models. Finally, we used the new models and mitochondrial protein data to certify that Testudines, Aves, and Crocodylia form one separated clade within amniotes. Conclusions We introduced new mitochondrial amino acid substitution models for metazoan mitochondrial proteins. The new models outperform the existing models in inferring phylogenies from metazoan mitochondrial protein data. We strongly recommend researchers to use the new models in analysing metazoan mitochondrial protein data
FLU, an amino acid substitution model for influenza proteins
Abstract Background The amino acid substitution model is the core component of many protein analysis systems such as sequence similarity search, sequence alignment, and phylogenetic inference. Although several general amino acid substitution models have been estimated from large and diverse protein databases, they remain inappropriate for analyzing specific species, e.g., viruses. Emerging epidemics of influenza viruses raise the need for comprehensive studies of these dangerous viruses. We propose an influenza-specific amino acid substitution model to enhance the understanding of the evolution of influenza viruses. Results A maximum likelihood approach was applied to estimate an amino acid substitution model (FLU) from ~113, 000 influenza protein sequences, consisting of ~20 million residues. FLU outperforms 14 widely used models in constructing maximum likelihood phylogenetic trees for the majority of influenza protein alignments. On average, FLU gains ~42 log likelihood points with an alignment of 300 sites. Moreover, topologies of trees constructed using FLU and other models are frequently different. FLU does indeed have an impact on likelihood improvement as well as tree topologies. It was implemented in PhyML and can be downloaded from ftp://ftp.sanger.ac.uk/pub/1000genomes/lsq/FLU or included in PhyML 3.0 server at http://www.atgc-montpellier.fr/phyml/. Conclusions FLU should be useful for any influenza protein analysis system which requires an accurate description of amino acid substitutions.</p
AMRomics: a scalable workflow to analyze large microbial genome collections
Whole genome analysis for microbial genomics is critical to studying and monitoring antimicrobial resistance strains. The exponential growth of microbial sequencing data necessitates a fast and scalable computational pipeline to generate the desired outputs in a timely and cost-effective manner. Recent methods have been implemented to integrate individual genomes into large collections of specific bacterial populations and are widely employed for systematic genomic surveillance. However, they do not scale well when the population expands and turnaround time remains the main issue for this type of analysis. Here, we introduce AMRomics, an optimized microbial genomics pipeline that can work efficiently with big datasets. We use different bacterial data collections to compare AMRomics against competitive tools and show that our pipeline can generate similar results of interest but with better performance. The software is open source and is publicly available at https://github.com/amromics/amromics under an MIT license
De novo copy number variations in candidate genomic regions in patients of severe autism spectrum disorder in Vietnam
Autism spectrum disorder (ASD) is a developmental disorder with a prevalence of around 1% children worldwide and characterized by patient behaviour (communication, social interaction, and personal development). Data on the efficacy of diagnostic tests using copy number variations (CNVs) in candidate genes in ASD is currently around 10% but it is overrepresented by patients of Caucasian background. We report here that the diagnostic success of de novo candidate CNVs in Vietnamese ASD patients is around 6%. We recruited one hundred trios (both parents and a child) where the child was clinically diagnosed with ASD while the parents were not affected. We performed genetic screening to exclude RETT syndrome and Fragile X syndrome and performed genome-wide DNA microarray (aCGH) on all probands and their parents to analyse for de novo CNVs. We detected 1708 non-redundant CNVs in 100 patients and 118 (7%) of them were de novo. Using the filter for known CNVs from the Simons Foundation Autism Research Initiative (SFARI) database, we identified six CNVs (one gain and five loss CNVs) in six patients (3 males and 3 females). Notably, 3 of our patients had a deletion involving the SHANK3 gene–which is the highest compared to previous reports. This is the first report of candidate CNVs in ASD patients from Vietnam and provides the framework for building a CNV based test as the first tier screening for clinical management
Socializing One Health: an innovative strategy to investigate social and behavioral risks of emerging viral threats
In an effort to strengthen global capacity to prevent, detect, and control infectious diseases in animals and people, the United States Agency for International Development’s (USAID) Emerging Pandemic Threats (EPT) PREDICT project funded development of regional, national, and local One Health capacities for early disease detection, rapid response, disease control, and risk reduction. From the outset, the EPT approach was inclusive of social science research methods designed to understand the contexts and behaviors of communities living and working at human-animal-environment interfaces considered high-risk for virus emergence. Using qualitative and quantitative approaches, PREDICT behavioral research aimed to identify and assess a range of socio-cultural behaviors that could be influential in zoonotic disease emergence, amplification, and transmission. This broad approach to behavioral risk characterization enabled us to identify and characterize human activities that could be linked to the transmission dynamics of new and emerging viruses. This paper provides a discussion of implementation of a social science approach within a zoonotic surveillance framework. We conducted in-depth ethnographic interviews and focus groups to better understand the individual- and community-level knowledge, attitudes, and practices that potentially put participants at risk for zoonotic disease transmission from the animals they live and work with, across 6 interface domains. When we asked highly-exposed individuals (ie. bushmeat hunters, wildlife or guano farmers) about the risk they perceived in their occupational activities, most did not perceive it to be risky, whether because it was normalized by years (or generations) of doing such an activity, or due to lack of information about potential risks. Integrating the social sciences allows investigations of the specific human activities that are hypothesized to drive disease emergence, amplification, and transmission, in order to better substantiate behavioral disease drivers, along with the social dimensions of infection and transmission dynamics. Understanding these dynamics is critical to achieving health security--the protection from threats to health-- which requires investments in both collective and individual health security. Involving behavioral sciences into zoonotic disease surveillance allowed us to push toward fuller community integration and engagement and toward dialogue and implementation of recommendations for disease prevention and improved health security