98 research outputs found

    Bacterial genotyping by 16S rRNA mass cataloging

    Get PDF
    BACKGROUND: It has recently been demonstrated that organism identifications can be recovered from mass spectra using various methods including base-specific fragmentation of nucleic acids. Because mass spectrometry is extremely rapid and widely available such techniques offer significant advantages in some applications. A key element in favor of mass spectrometric analysis of RNA fragmentation patterns is that a reference database for analysis of the results can be generated from sequence information. In contrast to hybridization approaches, the genetic affinity of any unknown isolate can in principle be determined within the context of all previously sequenced 16S rRNAs without prior knowledge of what the organism is. In contrast to the original RNase T(1 )cataloging method, when digestion products are analyzed by mass spectrometry, products with the same base composition cannot be distinguished. Hence, it is possible that organisms that are not closely related (having different underlying sequences) might be falsely identified by mass spectral coincidence. We present a convenient spectral coincidence function for expressing the degree of similarity (or distance) between any two mass-spectra. Trees constructed using this function are consistent with those produced by direct comparison of primary sequences, demonstrating that the inherent degeneracy in mass spectrometric analysis of RNA fragments does not preclude correct organism identification. RESULTS: Neighbor-joining trees for important bacterial pathogens were generated using distances based on mass spectrometric observables and the spectral coincidence function. These trees demonstrate that most pathogens will be readily distinguished using mass spectrometric analyses of RNA digestion products. A more detailed, genus-level analysis of pathogens and near relatives was also performed, and it was found that assignments of genetic affinity were consistent with those obtained by direct sequence comparisons. Finally, typical values of the coincidence between organisms were also examined with regard to phylogenetic level and sequence variability. CONCLUSION: Cluster analysis based on comparison of mass spectrometric observables using the spectral coincidence function is an extremely useful tool for determining the genetic affinity of an unknown bacterium. Additionally, fragmentation patterns can determine within hours if an unknown isolate is potentially a known pathogen among thousands of possible organisms, and if so, which one

    Computational Design of Novel Non-Ribosomal Peptides

    Get PDF
    Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals, and anti-cancer agents. Several groups have tried to expand natural product diversity by intermixing different NRPS modules to create synthetic peptides. This approach has not been as successful as anticipated, suggesting that these modules are not fully interchangeable. Here, we explored whether inter-modular linkers (IMLs) impact the ability of NRPS modules to communicate during the synthesis of NRPs. We developed a parser to extract 39,804 IMLs from both well annotated and putative NRPS biosynthetic gene clusters from 39,232 bacterial genomes and established the first IMLs database. We analyzed these IMLs and identified a striking relationship between IMLs and the amino acid substrates of their adjacent modules. More than 92% of the identified IMLs connect modules that activate a particular pair of substrates, suggesting that significant specificity is embedded within these sequences. We therefore propose that incorporating the correct IML is critical when attempting combinatorial biosynthesis of novel NRPS. In addition to the IMLs database and IML-Parser we have developed the NRP Discovery Pipeline, which is a set of bioinformatics and cheminformatics tools that will help facilitating early discovery of novel NRPs. Our pipeline comprises of five modules: (1) NRP comprehensive combinatorial biosynthesis: A tool that helps generating virtual libraries of NRPs. (2) NRP sequence-based predictor: A classifier based only on peptide sequences to help triaging peptides with no antibacterial activity. (3) Pep2struc: A tool that helps converting peptide sequences to their 2D structures form both linear and constrained peptides. (4) NRP structure-based predictor: A second classifier based on peptide structures to filter out inactive predicted peptides. (5) NRPS Designer: A tool that helps reprogramming of the bacterial genome by editing its NRP BGC to synthesize the peptide of interest. The IMLs database as well as the NRPS-Parser have been made available on the web at https://nrps-linker.unc.edu. The entire source code of the projects discussed in this dissertation is hosted in GitHub repository (https://github.com/SWFarag).Doctor of Philosoph

    Statistical analysis tools for metabolic and genomic bacterial data

    Get PDF
    This thesis introduces statistical analysis methods for two types of bacterial data: metabolic data produced by phenotype microarray technology, and genomic data produced by sequencing technologies. As both technologies produce vast amounts of data, as well as have special features, there is a need for bioinformatics tools that adequately process and analyze the information produced. Similar to all biomolecular data analyses, the interplay between biological components poses an additional challenge to the method development. A specific complication, regarding the metabolic data, is the lack of larger quantities of replicates due to the high expenses of performing the experiments. In terms of the sequence data, genome-wide analysis tools are desired, since such methods have not yet been widely developed for bacteria, even though they exist for eukaryotic genetics. The thesis briefly reviews the current methods, and introduces new approaches tackling the above mentioned problems.Tässä väitöskirjassa kehitetään uusia tilastollisia analysointimenetelmiä fenotyyppimikrosiru- sekä geenisekvenssidatalle, joista ensimmäinen kuvaa solujen aineenvaihdunnan aktiivisuutta ja jälkimmäinen avaa solun geneettisen koodin. Tilastollisia menetelmiä tarvitaan, kun kyseisillä mittaustekniikoilla tuotettua tietoa halutaan hyödyntää esimerkiksi lääketieteen tarpeisiin vaikkapa uusia hoitomuotoja kehitettäessä. Nykyaikaisille molekyylitason mittauslaitteille on ominaista, että ne tuottavat suuren määrän havaintoja. Lisäksi jokaiseen menetelmään liittyy omat erityispiirteensä, jotka on huomioitava dataa tulkittaessa. Esimerkiksi fenotyyppimikrosirudataa analysoitaessa on huomioitava datan moniulotteinen luonne: yhdellä kokeella voidaan tutkia tuhansia fenotyyppejä yli ajan. Tilastollisten menetelmien kehittämistä ja luotettavaa tilastollista testaamista vaikeuttavat lisäksi pienet toistomäärät sekä datan vähäinen saatavuus, mikä on puolestaan seurausta siitä, että fenotyyppimikrosiruteknologia on vielä melko tuntematon, vähän käytetty menetelmä, joka koetaan hankalaksi tulkita. Geenisekvenssejä analysoitaessa on puolestaan huomioitava esimerkiksi tutkittavan organismin erityispiirteet, sillä eri organismit poikkeavat toisistaan geneettisiltä ominaisuuksiltaan. Ihmisillä geneettisten ominaisuuksien yhteyttä moniin sairauksiin kuten syöpiin on tutkittu esimerkiksi koko genominlaajuisilla assosiaatioanalyysimenetelmillä. Tässä väitöskirjassa esittelemme bakteerien geenisekvenssien analysointia varten kehitetyn koko genominlaajuisen menetelmän, jolla voidaan esimerkiksi kartoittaa bakteerien antibioottiresistenssiin vaikuttavia geneettisiä tekijöitä

    Long-term study of changes in microbiota in a cystic fibrosis patient

    Get PDF
    Application of culture-independent techniques have revealed the presence of more types of bacteria than were previously thought, which led to the current description of Cystic Fibrosis (CF) being a polymicrobial disease. We know this polymicrobial community changes over time and during exacerbation events, and that interactions with non-pathogenic taxa can influence pathogen gene expression. The polymicrobial nature of infection may explain why in vitro responses and susceptibility of bacteria such as Pseudomonas aeruginosa to antibiotics do not always correlate with in vivo outcomes. The ability of bacteria to adapt to the CF lung complicates long-term treatment strategies. Much is known about the involvement of P. aeruginosa in lung colonization and deterioration, and the genetic adaptations it undergoes over time. Less is known about the adaptations that enable another CF pathogen, Burkholderia multivorans, to become resistant to antibiotics and persist in the lung environment. We identified a B. multivorans strain that acquired resistance in vivo to an antibiotic and became the dominant strain within a period of four days. Expectorated sputum samples are the gold standard for identifying the pathogens present in the CF lungs. Sputum in CF is primarily composed of free DNA from host immune cells and bacterial cells which is markedly different from the normal mucus that lines the lung epithelia. This composition, along with the dehydrated nature of sputum, increases viscosity and heterogeneity of bacterial distribution. Culture-independent assays which examine bacterial diversity and abundance in sputum rely on bacterial DNA extracted from aliquots which may not be representative of the whole sample. Sputum is typically homogenized through chemical means prior to DNA extraction but we have shown that adding a mechanical homogenization step significantly increases bacterial distribution within a sputum sample. Acute bacterial infections are the major cause for pulmonary exacerbations (PE) in Cystic Fibrosis. PEs are connected to increased mortality and may result in a permanent impairment in lung function. Attempts at developing tools to predict an oncoming PE have been met with limited success due to the heterogeneity of patient characteristics. We analyzed bacterial DNA from 130 sputum samples collected weekly for three years to identify changes in bacterial diversity and abundance by combining frequent patient sampling, next generation sequencing, and quantitative PCR (qPCR). Approximately 81,000,000 sequences containing 150 taxa were identified. Changes in microbial diversity and abundance did not correlate to antibiotic treatment for a PE. A gradual increase in abundance of all bacteria, Pseudomonas, and Burkholderia was shown over the sampling period along with a gradual decline in lung function. Ours is the first to demonstrate a stable microbial diversity coupled with a gradual change in abundance of all bacteria, Pseudomonas, and Burkholderia over a multi-year period. Regardless of the specific goal, it is clear that to understand CF infections requires knowledge of more than the dominant pathogen. The data described in this dissertation demonstrate the importance of repeated, longitudinal sampling for studying microbial communities in human subjects where some variation in microbial community composition can occur, even between sequential samples from a single clinically stable patient

    Creation, evaluation, and use of PSI, a program for identifying protein-phenotype relationships and comparing protein content in groups of organisms

    Get PDF
    Recent advances in DNA sequencing technology have enabled entire genomes to be sequenced quickly and accurately, resulting in an exponential increase in the number of organisms whose genome sequences have been elucidated. While the genome sequence of a given organism represents an important starting point in understanding its physiology, the functions of the protein products of many genes are still unknown; as such, computational methods for studying protein function are becoming increasingly important. In addition, this wealth of genomic information has created an unprecedented opportunity to compare the protein content of different organisms; among other applications, this can enable us to improve taxonomic classifications, to develop more accurate diagnostic tests for identifying particular bacteria, and to better understand protein content relationships in both closely-related and distantly-related organisms. This thesis describes the design, evaluation, and use of a program called Proteome Subtraction and Intersection (PSI) that uses an idea called genome subtraction for discovering protein-phenotype relationships and for characterizing differences in protein content in groups of organisms. PSI takes as input a set of proteomes, as well as a partitioning of that set into a subset of "included" proteomes and a subset of "excluded" proteomes. Using reciprocal BLAST hits, PSI finds orthologous relationships among all the proteins in the proteomes from the original set, and then finds groups of orthologous proteins containing at least one orthologue from each of the proteomes in the "included" subset, and none from any of the proteomes in the "excluded" subset. PSI is first applied to finding protein-phenotype relationships. By identifying proteins that are present in all sequenced isolates of the genus Lactobacillus, but not in the related bacterium Pediococcus pentosaceus, proteins are discovered that are likely to be responsible for the difference in cell shape between the lactobacilli and P. pentosaceus. In addition, proteins are identified that may be responsible for resistance to the antibiotic gatifloxacin in some lactic acid bacteria. This thesis also explores the use of PSI for comparing protein content in groups of organisms. Based on the idea of genome subtraction, a novel metric is proposed for comparing the difference in protein content between two organisms. This metric is then used to create a phylogenetic tree for a large set of bacteria, which to the author's knowledge represents the largest phylogenetic tree created to date using protein content. In addition, PSI is used to find the proteomic cohesiveness of isolates of several bacterial species in order to support or refute their current taxonomic classifications. Overall, PSI is a versatile tool with many interesting applications, and should become more and more valuable as additional genomic information becomes available

    Using next generation sequencing approaches to define the population biology of the neglected cystic fibrosis lung pathogen Burkholderia multivorans

    Get PDF
    Burkholderia multivorans is the most frequently isolated Burkholderia cepacia complex species recovered from cystic fibrosis lung infection. However, its pathogenesis and species population biology remain elusive. Understanding adaptational factors of B. multivorans to the CF lung microenvironment is important for predicting its pathogenesis and disease outcome. B. multivorans population biology was explored using pan genome analysis, average nucleotide identity and phylogenomic analysis (n = 283). The population split into two major genomic lineages, designated 1 and 2, and four B. multivorans model strains were selected to represent them: the soil strain ATCC 17616 (lineage 2a), BCC1272 (lineage 2a), BCC0033 (lineage 2b), and BCC0084 (lineage 1). The latter 3 CF strains were completely genome sequenced to add to the readily available reference genome ATCC 17616. Using gene presence-absence analysis, unique B. multivorans lineage-specific genes were identified. This enabled diagnostic PCR design with genes ghrB_1 and glnM_2 selected as the lineage 1 and lineage 2 targets, respectively. The PCRs showed 100% lineage-specificity against 48 B. multivorans strains. Phenotypic analysis was performed on a subset of 49 B. multivorans strains evaluating their morphology, growth kinetics, motility, biofilm formation, and exopolysaccharide production. The B. multivorans phenotype was variable amongst the strains, with no link to genomic lineage. Phenotypic comparison was also performed when B. multivorans were mixed with a secondary CF pathogen. The suppression of P. aeruginosa LESB58 protease production, when cultured with B. multivorans, was identified as an interesting interaction based on an unknown mechanism. Three of the B. multivorans model strains (BCC0033, BCC0084, and ATCC 17616) were also evaluated in a murine respiratory infection model and all showed good persistence over 5-days. Overall, this work has built a foundation of knowledge on the B. multivorans phenotype and genotype, enabling associations between lineage, therapeutics testing, and clinical outcome to be studie

    DEVELOPMENT OF A WEBTOOL FOR INTERACTOME­SEQUENCING DATA ANALYSIS AND IDENTIFICATION OF H. PYLORI EPITOPES RESPONSIBLE FOR HOST IMMUNO­RESPONSE MODULATION.

    Get PDF
    To elucidate the molecular mechanisms involved in persistency/latency of the H. pylori infection or in its progression towards serious diseases, it is necessary to analyse the host pathogen interaction in vivo. The circulating antibody repertoire represents an important source of diagnostic information, serving as biomarker to provide a \u201cdisease signature\u201d. The aim of this work is the identification of H. pylori epitopes responsible for host immuno\uadresponse modulation through: a discovery\uaddriven approach that couples \u201cphage display\u201d and deep sequencing (interactome-sequencing) and the development of a specific webtool for interactome-sequencing data analysis. We used this approach to identify novel antigens by screening gDNA libraries created from the pathogen\u2019s genome, directly with sera from infected patients. Two genomic phage display libraries from 26695 and B128 H. pylori strains have been constructed by using f\uadlactamase ORF selection vector. Genomic DNA was sonicated, fragments cloned into the filtering vector, after transformation libraries of 1x106 clones were obtained and sequenced by Illumina technology. More than 93% of Hp CDSs were represented in the phage genomic libraries therefore being representative of the whole H. pylori antigenic ORFeome. A webtool for interactome-sequencing data analysis was developed and used to identify the H. pylori antigens/epitopes which could be considered specific for infection progression towards three different pathological outcomes. Putative antigens were selected from libraries using sera from patients affected by: i) gastric adenocarcinoma; ii) autoimmune gastritis; iii) MALT lymphoma. The results, obtained thanks to the new interactome sequencing pipeline developed, show that the diversity of the libraries after selection is significantly reduced. Furthermore, individual ranks, for each infection condition, have been compared highlighting the pattern of putative antigens, shared by all the conditions, and some that can distinguish the different stages of infection. One of this new antigens, that seems to be specific for infection progression towards more serious diseases, has been successfully validated through ELISA assay on a wide number of sera from patients. Other more specific antigens identified by our approach and by the application of the new data analysis pipeline here described are in validation
    corecore