1,716 research outputs found
PhylOTU: a high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data.
Microbial diversity is typically characterized by clustering ribosomal RNA (SSU-rRNA) sequences into operational taxonomic units (OTUs). Targeted sequencing of environmental SSU-rRNA markers via PCR may fail to detect OTUs due to biases in priming and amplification. Analysis of shotgun sequenced environmental DNA, known as metagenomics, avoids amplification bias but generates fragmentary, non-overlapping sequence reads that cannot be clustered by existing OTU-finding methods. To circumvent these limitations, we developed PhylOTU, a computational workflow that identifies OTUs from metagenomic SSU-rRNA sequence data through the use of phylogenetic principles and probabilistic sequence profiles. Using simulated metagenomic data, we quantified the accuracy with which PhylOTU clusters reads into OTUs. Comparisons of PCR and shotgun sequenced SSU-rRNA markers derived from the global open ocean revealed that while PCR libraries identify more OTUs per sequenced residue, metagenomic libraries recover a greater taxonomic diversity of OTUs. In addition, we discover novel species, genera and families in the metagenomic libraries, including OTUs from phyla missed by analysis of PCR sequences. Taken together, these results suggest that PhylOTU enables characterization of part of the biosphere currently hidden from PCR-based surveys of diversity
Trends of the major porin gene (ompF) evolution
OmpF is one of the major general porins of Enterobacteriaceae that belongs to the first line of bacterial defense and interactions with the biotic as well as abiotic environments. Porins are surface exposed and their structures strongly reflect the history of multiple interactions with the environmental challenges. Unfortunately, little is known on diversity of porin genes of Enterobacteriaceae and the genus Yersinia especially. We analyzed the sequences of the ompF gene from 73 Yersinia strains covering 14 known species. The phylogenetic analysis placed most of the Yersinia strains in the same line assigned by 16S rDNA-gyrB tree. Very high congruence in the tree topologies was observed for Y. enterocolitica, Y. kristensenii, Y. ruckeri, indicating that intragenic recombination in these species had no effect on the ompF gene. A significant level of intra- and interspecies recombination was found for Y. aleksiciae, Y. intermedia and Y. mollaretii. Our analysis shows that the ompF gene of Yersinia has evolved with nonrandom mutational rate under purifying selection. However, several surface loops in the OmpF porin contain positively selected sites, which very likely reflect adaptive diversification Yersinia to their ecological niches. To our knowledge, this is a first investigation of diversity of the porin gene covering the whole genus of the family Enterobacteriaceae. This study demonstrates that recombination and positive selection both contribute to evolution of ompF, but the relative contribution of these evolutionary forces are different among Yersinia species
Evolution and Global Transmission of a Multidrug-Resistant, Community-Associated Methicillin-Resistant Staphylococcus aureus Lineage from the Indian Subcontinent
The evolution and global transmission of antimicrobial resistance have been well documented for Gram-negative bacteria and health care-associated epidemic pathogens, often emerging from regions with heavy antimicrobial use. However, the degree to which similar processes occur with Gram-positive bacteria in the community setting is less well understood. In this study, we traced the recent origins and global spread of a multidrug-resistant, community-associated Staphylococcus aureus lineage from the Indian subcontinent, the Bengal Bay clone (ST772). We generated whole-genome sequence data of 340 isolates from 14 countries, including the first isolates from Bangladesh and India, to reconstruct the evolutionary history and genomic epidemiology of the lineage. Our data show that the clone emerged on the Indian subcontinent in the early 1960s and disseminated rapidly in the 1990s. Short-term outbreaks in community and health care settings occurred following intercontinental transmission, typically associated with travel and family contacts on the subcontinent, but ongoing endemic transmission was uncommon. Acquisition of a multidrug resistance integrated plasmid was instrumental in the emergence of a single dominant and globally disseminated clade in the early 1990s. Phenotypic data on biofilm, growth, and toxicity point to antimicrobial resistance as the driving force in the evolution of ST772. The Bengal Bay clone therefore combines the multidrug resistance of traditional health care-associated clones with the epidemiological transmission of community-associated methicillin-resistant S. aureus (MRSA). Our study demonstrates the importance of whole-genome sequencing for tracking the evolution of emerging and resistant pathogens. It provides a critical framework for ongoing surveillance of the clone on the Indian subcontinent and elsewhere
Evolution and Global Transmission of a Multidrug-Resistant, Community-Associated Methicillin-Resistant Staphylococcus aureus Lineage from the Indian Subcontinent.
The evolution and global transmission of antimicrobial resistance have been well documented for Gram-negative bacteria and health care-associated epidemic pathogens, often emerging from regions with heavy antimicrobial use. However, the degree to which similar processes occur with Gram-positive bacteria in the community setting is less well understood. In this study, we traced the recent origins and global spread of a multidrug-resistant, community-associated Staphylococcus aureus lineage from the Indian subcontinent, the Bengal Bay clone (ST772). We generated whole-genome sequence data of 340 isolates from 14 countries, including the first isolates from Bangladesh and India, to reconstruct the evolutionary history and genomic epidemiology of the lineage. Our data show that the clone emerged on the Indian subcontinent in the early 1960s and disseminated rapidly in the 1990s. Short-term outbreaks in community and health care settings occurred following intercontinental transmission, typically associated with travel and family contacts on the subcontinent, but ongoing endemic transmission was uncommon. Acquisition of a multidrug resistance integrated plasmid was instrumental in the emergence of a single dominant and globally disseminated clade in the early 1990s. Phenotypic data on biofilm, growth, and toxicity point to antimicrobial resistance as the driving force in the evolution of ST772. The Bengal Bay clone therefore combines the multidrug resistance of traditional health care-associated clones with the epidemiological transmission of community-associated methicillin-resistant S. aureus (MRSA). Our study demonstrates the importance of whole-genome sequencing for tracking the evolution of emerging and resistant pathogens. It provides a critical framework for ongoing surveillance of the clone on the Indian subcontinent and elsewhere.IMPORTANCE The Bengal Bay clone (ST772) is a community-associated and multidrug-resistant Staphylococcus aureus lineage first isolated from Bangladesh and India in 2004. In this study, we showed that the Bengal Bay clone emerged from a virulent progenitor circulating on the Indian subcontinent. Its subsequent global transmission was associated with travel or family contact in the region. ST772 progressively acquired specific resistance elements at limited cost to its fitness and continues to be exported globally, resulting in small-scale community and health care outbreaks. The Bengal Bay clone therefore combines the virulence potential and epidemiology of community-associated clones with the multidrug resistance of health care-associated S. aureus lineages. This study demonstrates the importance of whole-genome sequencing for the surveillance of highly antibiotic-resistant pathogens, which may emerge in the community setting of regions with poor antibiotic stewardship and rapidly spread into hospitals and communities across the world
Reconstrução e classificação de sequências de ADN desconhecidas
The continuous advances in DNA sequencing technologies and techniques
in metagenomics require reliable reconstruction and accurate classification
methodologies for the diversity increase of the natural repository while contributing
to the organisms' description and organization. However, after
sequencing and de-novo assembly, one of the highest complex challenges
comes from the DNA sequences that do not match or resemble any biological
sequence from the literature. Three main reasons contribute to this
exception: the organism sequence presents high divergence according to the
known organisms from the literature, an irregularity has been created in the
reconstruction process, or a new organism has been sequenced. The inability
to efficiently classify these unknown sequences increases the sample
constitution's uncertainty and becomes a wasted opportunity to discover
new species since they are often discarded.
In this context, the main objective of this thesis is the development and
validation of a tool that provides an efficient computational solution to
solve these three challenges based on an ensemble of experts, namely
compression-based predictors, the distribution of sequence content, and
normalized sequence lengths. The method uses both DNA and amino acid
sequences and provides efficient classification beyond standard referential
comparisons. Unusually, it classifies DNA sequences without resorting directly
to the reference genomes but rather to features that the species biological
sequences share. Specifically, it only makes use of features extracted
individually from each genome without using sequence comparisons.
RFSC was then created as a machine learning classification pipeline that
relies on an ensemble of experts to provide efficient classification in metagenomic
contexts. This pipeline was tested in synthetic and real data, both
achieving precise and accurate results that, at the time of the development
of this thesis, have not been reported in the state-of-the-art. Specifically, it
has achieved an accuracy of approximately 97% in the domain/type classification.Os contínuos avanços em tecnologias de sequenciação de ADN e técnicas
em meta genómica requerem metodologias de reconstrução confiáveis e de
classificação precisas para o aumento da diversidade do repositório natural,
contribuindo, entretanto, para a descrição e organização dos organismos.
No entanto, após a sequenciação e a montagem de-novo, um dos desafios
mais complexos advém das sequências de ADN que não correspondem ou se
assemelham a qualquer sequencia biológica da literatura. São três as principais
razões que contribuem para essa exceção: uma irregularidade emergiu
no processo de reconstrução, a sequência do organismo é altamente dissimilar
dos organismos da literatura, ou um novo e diferente organismo foi
reconstruído. A incapacidade de classificar com eficiência essas sequências
desconhecidas aumenta a incerteza da constituição da amostra e desperdiça
a oportunidade de descobrir novas espécies, uma vez que muitas vezes são
descartadas.
Neste contexto, o principal objetivo desta tese é fornecer uma solução computacional
eficiente para resolver este desafio com base em um conjunto
de especialistas, nomeadamente preditores baseados em compressão, a distribuição de conteúdo de sequência e comprimentos de sequência normalizados.
O método usa sequências de ADN e de aminoácidos e fornece classificação eficiente além das comparações referenciais padrão. Excecionalmente,
ele classifica as sequências de ADN sem recorrer diretamente a genomas
de referência, mas sim às características que as sequências biológicas da
espécie compartilham. Especificamente, ele usa apenas recursos extraídos
individualmente de cada genoma sem usar comparações de sequência. Além
disso, o pipeline é totalmente automático e permite a reconstrução sem referência de genomas a partir de reads FASTQ com a garantia adicional de
armazenamento seguro de informações sensíveis.
O RFSC é então um pipeline de classificação de aprendizagem automática
que se baseia em um conjunto de especialistas para fornecer classificação
eficiente em contextos meta genómicos. Este pipeline foi aplicado em dados
sintéticos e reais, alcançando em ambos resultados precisos e exatos que,
no momento do desenvolvimento desta dissertação, não foram relatados na
literatura. Especificamente, esta ferramenta desenvolvida, alcançou uma
precisão de aproximadamente 97% na classificação de domínio/tipo.Mestrado em Engenharia de Computadores e Telemátic
Severe infections emerge from commensal bacteria by adaptive evolution
Bacteria responsible for the greatest global mortality colonize the human microbiota far more frequently than they cause severe infections. Whether mutation and selection among commensal bacteria are associated with infection is unknown. We investigated de novo mutation in 1163 Staphylococcus aureus genomes from 105 infected patients with nose colonization. We report that 72% of infections emerged from the nose, with infecting and nose-colonizing bacteria showing parallel adaptive differences. We found 2.8-to-3.6-fold adaptive enrichments of protein-altering variants in genes responding to rsp, which regulates surface antigens and toxin production; agr, which regulates quorum-sensing, toxin production and abscess formation; and host-derived antimicrobial peptides. Adaptive mutations in pathogenesis-associated genes were 3.1-fold enriched in infecting but not nose-colonizing bacteria. None of these signatures were observed in healthy carriers nor at the species-level, suggesting infection-associated, short-term, within-host selection pressures. Our results show that signatures of spontaneous adaptive evolution are specifically associated with infection, raising new possibilities for diagnosis and treatment
- …