22 research outputs found

    Meningococcus genome informatics platform: a system for analyzing multilocus sequence typing data

    Get PDF
    The Meningococcus Genome Informatics Platform (MGIP) is a suite of computational tools for the analysis of multilocus sequence typing (MLST) data, at http://mgip.biology.gatech.edu. MLST is used to generate allelic profiles to characterize strains of Neisseria meningitidis, a major cause of bacterial meningitis worldwide. Neisseria meningitidis strains are characterized with MLST as specific sequence types (ST) and clonal complexes (CC) based on the DNA sequences at defined loci. These data are vital to molecular epidemiology studies of N. meningitidis, including outbreak investigations and population biology. MGIP analyzes DNA sequence trace files, returns individual allele calls and characterizes the STs and CCs. MGIP represents a substantial advance over existing software in several respects: (i) ease of use—MGIP is user friendly, intuitive and thoroughly documented; (ii) flexibility—because MGIP is a website, it is compatible with any computer with an internet connection, can be used from any geographic location, and there is no installation; (iii) speed—MGIP takes just over one minute to process a set of 96 trace files; and (iv) expandability—MGIP has the potential to expand to more loci than those used in MLST and even to other bacterial species

    Genomic fluidity: an integrative view of gene diversity within microbial populations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria.</p> <p>Results</p> <p>We use a combination of mathematical, statistical and computational methods to show that current predictions of pan and core genome sizes may have no correspondence to true values. Pan and core genome size estimates are problematic because they depend on the estimation of the occurrence of rare genes and genomes, respectively, which are difficult to estimate precisely because they are rare. Instead, we introduce and evaluate a robust metric - genomic fluidity - to categorize the gene-level similarity among groups of sequenced isolates. Genomic fluidity is a measure of the dissimilarity of genomes evaluated at the gene level.</p> <p>Conclusions</p> <p>The genomic fluidity of a population can be estimated accurately given a small number of sequenced genomes. Further, the genomic fluidity of groups of organisms can be compared robustly despite variation in algorithms used to identify genes and their homologs. As such, we recommend that genomic fluidity be used in place of pan and core genome size estimates when assessing gene diversity within genomes of a species or a group of closely related organisms.</p

    A computational genomics pipeline for prokaryotic sequencing projects

    Get PDF
    Motivation: New sequencing technologies have accelerated research on prokaryotic genomes and have made genome sequencing operations outside major genome sequencing centers routine. However, no off-the-shelf solution exists for the combined assembly, gene prediction, genome annotation and data presentation necessary to interpret sequencing data. The resulting requirement to invest significant resources into custom informatics support for genome sequencing projects remains a major impediment to the accessibility of high-throughput sequence data

    Ebola virus epidemiology, transmission, and evolution during seven months in Sierra Leone

    Get PDF
    The 2013-2015 Ebola virus disease (EVD) epidemic is caused by the Makona variant of Ebola virus (EBOV). Early in the epidemic, genome sequencing provided insights into virus evolution and transmission and offered important information for outbreak response. Here, we analyze sequences from 232 patients sampled over 7 months in Sierra Leone, along with 86 previously released genomes from earlier in the epidemic. We confirm sustained human-to-human transmission within Sierra Leone and find no evidence for import or export of EBOV across national borders after its initial introduction. Using high-depth replicate sequencing, we observe both host-to-host transmission and recurrent emergence of intrahost genetic variants. We trace the increasing impact of purifying selection in suppressing the accumulation of nonsynonymous mutations over time. Finally, we note changes in the mucin-like domain of EBOV glycoprotein that merit further investigation. These findings clarify the movement of EBOV within the region and describe viral evolution during prolonged human-to-human transmission

    Algorithm development for next generation sequencing-based metagenome analysis

    Get PDF
    We present research on the design, development and application of algorithms for DNA sequence analysis, with a focus on environmental DNA (metagenomes). We present an overview and primer on algorithm development for bioinformatics of metagenomes; work on frameshift detection in DNA sequencing data; work on a computational pipeline for the assembly, feature prediction, annotation and analysis of bacterial genomes; work on unsupervised phylogenetic clustering of metagenomic fragments using Markov Chain Monte Carlo methods; and work on estimation of bacterial genome plasticity and diversity, potential improvements to the measures of core and pan-genomes.PhDCommittee Chair: Weitz, Joshua; Committee Co-Chair: Jordan, I. King; Committee Member: Bader, David; Committee Member: Bergman, Nicholas; Committee Member: Chernoff, Yur

    Multiple whole-genome alignments without a reference organism

    No full text
    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families—perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml

    Conservation patterns in different functional sequence categories of divergent Drosophila species

    Get PDF
    We have explored the distributions of fully conserved ungapped blocks in genome-wide pairwise alignments of recently completed species of Drosophila: D.yakuba, D.ananassae, D.pseudoobscura, D.virilis and D.mojavensis. Based on these distributions we have found that nearly every functional sequence category possesses its own distinctive conservation pattern, sometimes independent of the overall sequence conservation level. In the coding and regulatory regions, the ungapped blocks were longer than in introns, UTRs and non-functional sequences. At the same time, the blocks in the coding regions carried 3N+2 signature characteristic to synonymic substitutions in the 3rd codon positions. Larger block sizes in transcription regulatory regions can be explained by the presence of conserved arrays of binding sites for transcription factors. We also have shown that the longest ungapped blocks, or 'ultraconserved' sequences, are associated with specific gene groups, including those encoding ion channels and components of the cytoskeleton. We discussed how restrained conservation patterns may help in mapping functional sequence categories and improving genome annotation

    Effect of contact phenomena on the electrical conductivity of reduced lithium niobate

    No full text
    Lithium niobate is a ferroelectric material finding a wide range of applications in optical and acoustic engineering. Annealing of lithium niobate crystals in an oxygen-free environment leads to appearance of black coloration and concomitant increasing electrical conductivity due to chemical reduction. There are plenty of literary data on the electrophysical properties of reduced lithium niobate crystals though contact phenomena occurring during electrical conductivity measurement as well as issues of interaction between the electrode material and the test specimens are almost disregarded. The effect of chromium and indium tin oxide electrodes on the results of measurements of electrophysical parameters at room temperature for lithium niobate specimens reduced at 1100 °C has been investigated. It was found that significant nonlinearities in the VACs of the specimens at below 5 V distort the specific resistivity readings for lithium niobate. This requires measurements at higher voltages. Impedance spectroscopy studies have shown that the measurement results are largely affected by capacities including those probably induced near the contacts. It has been shown that the experimental results are described adequately well by a model implying the presence of near-contact capacities that are parallel to the specimen’s own capacity. Possible mechanism of the induction of these capacities has been described and a hypothesis has been proposed of the high density of electron states at the electrode/specimen interface that can trap carriers, the concentration of trapped carriers growing with an increase in annealing duration
    corecore