13 research outputs found

    Exploring functional annotation through genomic and metagenomic data mining

    Get PDF
    Functional profiling of genomes and metagenomes, as well as data mining for novel proteins, all rely on computational methods for functional annotation of protein sequences. Standard methods assign protein function based on detected homology to reference sequences, but often leave behind a significant fraction of hypothetical sequences ("dark matter") that cannot be annotated. To maximize our ability to extract new biological insights from newly sequenced genomes, it is critical to understand the advantages and limitations of homology-based annotation, and explore alternative methods for inferring function. In this thesis, I performed a comprehensive exploration of computational protein annotation, with a focus on bacterial genomes and metagenomes. First, I applied homology-based methods to functionally annotate and analyze original datasets including newly sequenced Streptomyces strains, a wastewater metagenome, and microbial communities involved in vertebrate decomposition. These studies identified genes and functions of interest including cellulases, antibiotic resistance genes, and virulence factors. I then explored the limits of homology-based annotation by measuring annotation coverage, the fraction of annotated proteins in a proteome, across ~27,000 organisms in the microbial tree of life. This study demonstrated a wide range in annotation coverage across bacteria, from 2-86%. In addition, it revealed multiple factors including taxonomy, genome size, and research bias, as heavy influences on the degree to which proteomes could be annotated. To gain biological insights into hypothetical proteins of unknown function, I analyzed 4,049 domains of unknown function (DUFs) from Pfam. Using phylogenomic, taxonomic and metagenomic information, I detected statistical associations between domains and biological traits. Association-based methods uncovered environment, lineage, and/or pathogen associations in just under half of all DUFs and highlighted new families such as DUF4765 as intriguing virulence factor candidates. Finally, I constructed a database of "ORFan" metagenomic sequences that cannot be annotated using standard approaches, and inferred functions for tens of thousands of these sequences using profile-profile comparison approaches. Motif analysis and genomic context validated these predictions, enabling the discovery of hundreds of novel candidate metalloproteases. Protein "dark matter", which includes a large pool of unannotated coding sequences, is an incredible resource to find new proteins and functions of interest, and included are suggestions on how to prioritize these sequences for future study. A combination of homology-based and alternative annotation methods will be most effective for broad functional profiling of genomes and metagenomes, and can push the boundaries for functional interpretation of sequence data

    Gene expression and in situ protein profiling of candidate SARS-CoV-2 receptors in human airway epithelial cells and lung tissue

    Get PDF
    In December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)emerged, causing the coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV, the agent responsible for the 2003 SARS outbreak, utilises angiotensin-converting enzyme 2 (ACE2) and transmembrane serine protease 2 (TMPRSS2) host molecules for viral entry. ACE2 and TMPRSS2 have recently been implicated in SARS-CoV-2 viral infection. Additional host molecules including ADAM17, cathepsin L, CD147 and GRP78 may also function as receptors for SARS-CoV-2.To determine the expression and in situ localisation of candidate SARS-CoV-2 receptors in the respiratory mucosa, we analysed gene expression datasets from airway epithelial cells of 515 healthy subjects, gene promoter activity analysis using the FANTOM5 dataset containing 120 distinct sample types, single cell RNA sequencing (scRNAseq) of 10 healthy subjects, proteomic datasets, immunoblots on multiple airway epithelial cell types, and immunohistochemistry on 98 human lung samples.We demonstrate absent to lowACE2promoter activity in a variety of lung epithelial cell samples andlowACE2gene expression in both microarray and scRNAseq datasets of epithelial cell populations.Consistent with gene expression, rare ACE2 protein expression was observed in the airway epithelium and alveoli of human lung, confirmed with proteomics. We present confirmatory evidence for the presence ofTMPRSS2, CD147 and GRP78 protein in vitro in airway epithelial cells and confirm broad in situ protein expression of CD147 and GRP78 in the respiratory mucosa. Collectively, our data suggest the presence of a mechanism dynamically regulating ACE2 expression inhuman lung, perhaps in periods of SARS-CoV-2 infection, and also suggest that alternative receptors forSARS-CoV-2 exist to facilitate initial host cell infection

    AnnoTree: visualization and exploration of a functionally annotated microbial tree of life

    No full text
    Bacterial genomics has revolutionized our understanding of the microbial tree of life; however, mapping and visualizing the distribution of functional traits across bacteria remains a challenge. Here, we introduce AnnoTree-an interactive, functionally annotated bacterial tree of life that integrates taxonomic, phylogenetic and functional annotation data from over 27 000 bacterial and 1500 archaeal genomes. AnnoTree enables visualization of millions of precomputed genome annotations across the bacterial and archaeal phylogenies, thereby allowing users to explore gene distributions as well as patterns of gene gain and loss in prokaryotes. Using AnnoTree, we examined the phylogenomic distributions of 28 311 gene/protein families, and measured their phylogenetic conservation, patchiness, and lineage-specificity within bacteria. Our analyses revealed widespread phylogenetic patchiness among bacterial gene families, reflecting the dynamic evolution of prokaryotic genomes. Genes involved in phage infection/defense, mobile elements, and antibiotic resistance dominated the list of most patchy traits, as well as numerous intriguing metabolic enzymes that appear to have undergone frequent horizontal transfer. We anticipate that AnnoTree will be a valuable resource for exploring prokaryotic gene histories, and will act as a catalyst for biological and evolutionary hypothesis generation. AnnoTree is freely available at http://annotree.uwaterloo.ca

    Additional file 1: Figure S1. of MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes

    No full text
    MetAnnotate estimates microbial community abundance with high accuracy, demonstrated by estimated class-level taxonomic composition of the Simulated High Complexity Metagenome (simHC) dataset based on the taxonomic markers in Fig. 3a. The community abundance prediction made by MG-RAST (default parameters, LCA option) and three other methods are included for comparison. The Spearman correlations between known and estimated taxonomic abundance are: r = 0.82 (MetAnnotate); r = 0.75 (MG-RAST); r = 0.70 (MiniKraken); r = 0.65 (FCP NB-BL); r = 0.68 (FCP Epsilon-NB). (PDF 44 kb

    Genomic classification and antimicrobial resistance profiling of Streptococcus pneumoniae and Haemophilus influenzae isolates associated with paediatric otitis media and upper respiratory infection

    No full text
    Abstract Acute otitis media (AOM) is the most common childhood bacterial infectious disease requiring antimicrobial therapy. Most cases of AOM are caused by translocation of Streptococcus pneumoniae or Haemophilus influenzae from the nasopharynx to the middle ear during an upper respiratory tract infection (URI). Ongoing genomic surveillance of these pathogens is important for vaccine design and tracking of emerging variants, as well as for monitoring patterns of antibiotic resistance to inform treatment strategies and stewardship. In this work, we examined the ability of a genomics-based workflow to determine microbiological and clinically relevant information from cultured bacterial isolates obtained from patients with AOM or an URI. We performed whole genome sequencing (WGS) and analysis of 148 bacterial isolates cultured from the nasopharynx (N = 124, 94 AOM and 30 URI) and ear (N = 24, all AOM) of 101 children aged 6–35 months presenting with AOM or an URI. We then performed WGS-based sequence typing and antimicrobial resistance profiling of each strain and compared results to those obtained from traditional microbiological phenotyping. WGS of clinical isolates resulted in 71 S. pneumoniae genomes and 76 H. influenzae genomes. Multilocus sequencing typing (MSLT) identified 33 sequence types for S. pneumoniae and 19 predicted serotypes including the most frequent serotypes 35B and 3. Genome analysis predicted 30% of S. pneumoniae isolates to have complete or intermediate penicillin resistance. AMR predictions for S. pneumoniae isolates had strong agreement with clinical susceptibility testing results for beta-lactam and non beta-lactam antibiotics, with a mean sensitivity of 93% (86–100%) and a mean specificity of 98% (94–100%). MLST identified 29 H. influenzae sequence types. Genome analysis identified beta-lactamase genes in 30% of H. influenzae strains, which was 100% in agreement with clinical beta-lactamase testing. We also identified a divergent highly antibiotic-resistant strain of S. pneumoniae, and found its closest sequenced strains, also isolated from nasopharyngeal samples from over 15 years ago. Ultimately, our work provides the groundwork for clinical WGS-based workflows to aid in detection and analysis of H. influenzae and S. pneumoniae isolates

    Ancient Clostridium DNA and variants of tetanus neurotoxins associated with human archaeological remains

    No full text
    The analysis of microbial genomes from human archaeological samples offers a historic snapshot of ancient pathogens and provides insights into the origins of modern infectious diseases. Here, we analyze metagenomic datasets from 38 human archaeological samples and identify bacterial genomic sequences related to modern-day Clostridium tetani, which produces the tetanus neurotoxin (TeNT) and causes the disease tetanus. These genomic assemblies had varying levels of completeness, and a subset of them displayed hallmarks of ancient DNA damage. Phylogenetic analyses revealed known C. tetani clades as well as potentially new Clostridium lineages closely related to C. tetani. The genomic assemblies encode 13 TeNT variants with unique substitution profiles, including a subgroup of TeNT variants found exclusively in ancient samples from South America. We experimentally tested a TeNT variant selected from an ancient Chilean mummy sample and found that it induced tetanus muscle paralysis in mice, with potency comparable to modern TeNT. Thus, our ancient DNA analysis identifies DNA from neurotoxigenic C. tetani in archaeological human samples, and a novel variant of TeNT that can cause disease in mammals.</p

    Ancient Clostridium DNA and variants of tetanus neurotoxins associated with human archaeological remains

    No full text
    Abstract The analysis of microbial genomes from human archaeological samples offers a historic snapshot of ancient pathogens and provides insights into the origins of modern infectious diseases. Here, we analyze metagenomic datasets from 38 human archaeological samples and identify bacterial genomic sequences related to modern-day Clostridium tetani, which produces the tetanus neurotoxin (TeNT) and causes the disease tetanus. These genomic assemblies had varying levels of completeness, and a subset of them displayed hallmarks of ancient DNA damage. Phylogenetic analyses revealed known C. tetani clades as well as potentially new Clostridium lineages closely related to C. tetani. The genomic assemblies encode 13 TeNT variants with unique substitution profiles, including a subgroup of TeNT variants found exclusively in ancient samples from South America. We experimentally tested a TeNT variant selected from an ancient Chilean mummy sample and found that it induced tetanus muscle paralysis in mice, with potency comparable to modern TeNT. Thus, our ancient DNA analysis identifies DNA from neurotoxigenic C. tetani in archaeological human samples, and a novel variant of TeNT that can cause disease in mammals
    corecore