85 research outputs found

    Circlator: automated circularization of genome assemblies using long sequencing reads

    Get PDF
    The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/

    Comparison of classical multi-locus sequence typing software for next-generation sequencing data

    Get PDF
    Multi-locus sequence typing (MLST) is a widely used method for categorizing bacteria. Increasingly, MLST is being performed using next-generation sequencing (NGS) data by reference laboratories and for clinical diagnostics. Many software applications have been developed to calculate sequence types from NGS data; however, there has been no comprehensive review to date on these methods. We have compared eight of these applications against real and simulated data, and present results on: (1) the accuracy of each method against traditional typing methods, (2) the performance on real outbreak datasets, (3) the impact of contamination and varying depth of coverage, and (4) the computational resource requirements

    ARIBA: rapid antimicrobial resistance genotyping directly from sequencing reads.

    Get PDF
    Antimicrobial resistance (AMR) is one of the major threats to human and animal health worldwide, yet few high-throughput tools exist to analyse and predict the resistance of a bacterial isolate from sequencing data. Here we present a new tool, ARIBA, that identifies AMR-associated genes and single nucleotide polymorphisms directly from short reads, and generates detailed and customizable output. The accuracy and advantages of ARIBA over other tools are demonstrated on three datasets from Gram-positive and Gram-negative bacteria, with ARIBA outperforming existing methods

    Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data.

    Get PDF
    The rapidly reducing cost of bacterial genome sequencing has lead to its routine use in large-scale microbial analysis. Though mapping approaches can be used to find differences relative to the reference, many bacteria are subject to constant evolutionary pressures resulting in events such as the loss and gain of mobile genetic elements, horizontal gene transfer through recombination and genomic rearrangements. De novo assembly is the reconstruction of the underlying genome sequence, an essential step to understanding bacterial genome diversity. Here we present a high-throughput bacterial assembly and improvement pipeline that has been used to generate nearly 20 000 annotated draft genome assemblies in public databases. We demonstrate its performance on a public data set of 9404 genomes. We find all the genes used in multi-locus sequence typing schema present in 99.6 % of assembled genomes. When tested on low-, neutral- and high-GC organisms, more than 94 % of genes were present and completely intact. The pipeline has been proven to be scalable and robust with a wide variety of datasets without requiring human intervention. All of the software is available on GitHub under the GNU GPL open source license

    PlasmidTron: assembling the cause of phenotypes and genotypes from NGS data.

    Get PDF
    Increasingly rich metadata are now being linked to samples that have been whole-genome sequenced. However, much of this information is ignored. This is because linking this metadata to genes, or regions of the genome, usually relies on knowing the gene sequence(s) responsible for the particular trait being measured and looking for its presence or absence in that genome. Examples of this would be the spread of antimicrobial resistance genes carried on mobile genetic elements (MGEs). However, although it is possible to routinely identify the resistance gene, identifying the unknown MGE upon which it is carried can be much more difficult if the starting point is short-read whole-genome sequence data. The reason for this is that MGEs are often full of repeats and so assemble poorly, leading to fragmented consensus sequences. Since mobile DNA, which can carry many clinically and ecologically important genes, has a different evolutionary history from the host, its distribution across the host population will, by definition, be independent of the host phylogeny. It is possible to use this phenomenon in a genome-wide association study to identify both the genes associated with the specific trait and also the DNA linked to that gene, for example the flanking sequence of the plasmid vector on which it is encoded, which follows the same patterns of distribution as the marker gene/sequence itself. We present PlasmidTron, which utilizes the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographical information, to identify traits that are likely to be present on DNA that can randomly reassort across defined bacterial populations. It is also possible to use this methodology to associate unknown genes/sequences (e.g. plasmid backbones) with a specific molecular signature or marker (e.g. resistance gene presence or absence) using PlasmidTron. PlasmidTron uses a k-mer-based approach to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs in a fast and scalable-to-large manner. PlasmidTron is written in Python 3 and is available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron

    The Unexpectedly Bright Comet C-2012 F6 (Lemmon) Unveiled at Near-Infrared Wavelengths

    Get PDF
    We acquired near-infrared spectra of the Oort cloud comet C/2012 F6 (Lemmon) at three different heliocentric distances (R h) during the comet's 2013 perihelion passage, providing a comprehensive measure of the outgassing behavior of parent volatiles and cosmogonic indicators. Our observations were performed pre-perihelion at R h = 1.2 AU with CRIRES (on 2013 February 2 and 4), and post-perihelion at R h = 0.75 AU with CSHELL (on March 31 and April 1) and R h = 1.74 AU with NIRSPEC (on June 20). We detected 10 volatile species (H2O, OH* prompt emission, C2H6, CH3OH, H2CO, HCN, CO, CH4, NH3, and NH2), and obtained upper limits for two others (C2H2 and HDO). One-dimensional spatial profiles displayed different distributions for some volatiles, confirming either the existence of polar and apolar ices, or of chemically distinct active vents in the nucleus. The ortho-para ratio for water was 3.31 +/- 0.33 (weighted mean of CRIRES and NIRSPEC results), implying a spin temperature >37 K at the 95% confidence limit. Our (3) upper limit for HDO corresponds to D/H < 2.45 10-3 (i.e., <16 Vienna Standard Mean Ocean Water, VSMOW). At R h = 1.2 AU (CRIRES), the production rate for water was Q(H2O) = 1.9 +/- 0.1 1029 s-1 and its rotational temperature was T rot ~ 69 K. At R h = 0.75 AU (CSHELL), we measured Q(H2O) = 4.6 +/- 0.6 1029 s-1 and T rot = 80 K on March 31, and 6.6 +/- 0.9 1029 s-1 and T rot = 100 K on April 1. At R h = 1.74 AU (NIRSPEC), we obtained Q(H2O) = 1.1 +/- 0.1 1029 s-1 and T rot ~ 50 K. The measured volatile abundance ratios classify comet C/2012 F6 as rather depleted in C2H6 and CH3OH, while HCN, CH4, and CO displayed abundances close to their median values found among comets. H2CO was the only volatile showing a relative enhancement. The relative paucity of C2H6 and CH3OH (with respect to H2O) suggests formation within warm regions of the nebula. However, the normal abundance of HCN and hypervolatiles CH4 and CO, and the enhancement of H2CO, may indicate a possible heterogeneous nucleus of comet C/2012 F6 (Lemmon), possibly as a result of radial mixing within the protoplanetary dis

    Phylogenetic Analysis of Klebsiella pneumoniae from Hospitalized Children, Pakistan.

    Get PDF
    Klebsiella pneumoniae shows increasing emergence of multidrug-resistant lineages, including strains resistant to all available antimicrobial drugs. We conducted whole-genome sequencing of 178 highly drug-resistant isolates from a tertiary hospital in Lahore, Pakistan. Phylogenetic analyses to place these isolates into global context demonstrate the expansion of multiple independent lineages, including K. quasipneumoniae.This work was supported by National Health and Medical Research Council program grants (0606788 to R.A.S. and T. L.; 1092262 to R.A.S., G.D., and T.L.); the Wellcome Trust (206194); and the Higher Education Commission of Pakistan and The Children’s Hospital & The Institute of Child Health, Lahore, Pakistan. H.E. was supported by a scholarship from Higher Education Commission Pakistan under the International Research Support Initiative Program

    Predicting the immediate impact of national lockdown on neovascular age-related macular degeneration and associated visual morbidity: an INSIGHT Health Data Research Hub for Eye Health report

    Get PDF
    OBJECTIVE: Predicting the impact of neovascular age-related macular degeneration (nAMD) service disruption on visual outcomes following national lockdown in the UK to contain SARS-CoV-2. METHODS AND ANALYSIS: This retrospective cohort study includes deidentified data from 2229 UK patients from the INSIGHT Health Data Research digital hub. We forecasted the number of treatment-naïve nAMD patients requiring anti-vascular endothelial growth factor (anti-VEGF) initiation during UK lockdown (16 March 2020 through 31 July 2020) at Moorfields Eye Hospital (MEH) and University Hospitals Birmingham (UHB). Best-measured visual acuity (VA) changes without anti-VEGF therapy were predicted using post hoc analysis of Minimally Classic/Occult Trial of the Anti-VEGF Antibody Ranibizumab in the Treatment of Neovascular AMD trial sham-control arm data (n=238). RESULTS: At our centres, 376 patients were predicted to require anti-VEGF initiation during lockdown (MEH: 325; UHB: 51). Without treatment, mean VA was projected to decline after 12 months. The proportion of eyes in the MEH cohort predicted to maintain the key positive visual outcome of ≥70 ETDRS letters (Snellen equivalent 6/12) fell from 25.5% at baseline to 5.8% at 12 months (UHB: 9.8%-7.8%). Similarly, eyes with VA <25 ETDRS letters (6/96) were predicted to increase from 4.3% to 14.2% at MEH (UHB: 5.9%-7.8%) after 12 months without treatment. CONCLUSIONS: Here, we demonstrate how combining data from a recently founded national digital health data repository with historical industry-funded clinical trial data can enhance predictive modelling in nAMD. The demonstrated detrimental effects of prolonged treatment delay should incentivise healthcare providers to support nAMD patients accessing care in safe environments. TRIAL REGISTRATION NUMBER: NCT00056836

    Main-Belt Comet P/2012 T1 (PANSTARRS)

    Full text link
    We present initial results from observations and numerical analyses aimed at characterizing main-belt comet P/2012 T1 (PANSTARRS). Optical monitoring observations were made between October 2012 and February 2013 using the University of Hawaii 2.2 m telescope, the Keck I telescope, the Baade and Clay Magellan telescopes, Faulkes Telescope South, the Perkins Telescope at Lowell Observatory, and the Southern Astrophysical Research (SOAR) telescope. The object's intrinsic brightness approximately doubles from the time of its discovery in early October until mid-November and then decreases by ~60% between late December and early February, similar to photometric behavior exhibited by several other main-belt comets and unlike that exhibited by disrupted asteroid (596) Scheila. We also used Keck to conduct spectroscopic searches for CN emission as well as absorption at 0.7 microns that could indicate the presence of hydrated minerals, finding an upper limit CN production rate of QCN<1.5x10^23 mol/s, from which we infer a water production rate of QH2O<5x10^25 mol/s, and no evidence of the presence of hydrated minerals. Numerical simulations indicate that P/2012 T1 is largely dynamically stable for >100 Myr and is unlikely to be a recently implanted interloper from the outer solar system, while a search for potential asteroid family associations reveal that it is dynamically linked to the ~155 Myr-old Lixiaohua asteroid family.Comment: 15 pages, 4 figures, accepted for publication in ApJ Letter

    PlasmidTron: assembling the cause of phenotypes from NGS data

    Get PDF
    AbstractWhen defining bacterial populations through whole genome sequencing (WGS) the samples often have detailed associated metadata that relate to disease severity, antimicrobial resistance, or even rare biochemical traits. When comparing these bacterial populations, it is apparent that some of these phenotypes do not follow the phylogeny of the host i.e. they are genetically unlinked to the evolutionary history of the host bacterium. One possible explanation for this phenomenon is that the genes are moving independently between hosts and are likely associated with mobile genetic elements (MGE). However, identifying the element that is associated with these traits can be complex if the starting point is short read WGS data. With the increased use of next generation WGS in routine diagnostics, surveillance and epidemiology a vast amount of short read data is available and these types of associations are relatively unexplored. One way to address this would be to perform assembly de novo of the whole genome read data, including its MGEs. However, MGEs are often full of repeats and can lead to fragmented consensus sequences. Deciding which sequence is part of the chromosome, and which is part of a MGE can be ambiguous. We present PlasmidTron, which utilises the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographic information, to identify sequences that are likely to represent MGEs linked to the phenotype. Given a set of reads, categorised into cases (showing the phenotype) and controls (phylogenetically related but phenotypically negative), PlasmidTron can be used to assemble de novo reads from each sample linked by a phenotype. A k-mer based analysis is performed to identify reads associated with a phylogenetically unlinked phenotype. These reads are then assembled de novo to produce contigs. By utilising k-mers and only assembling a fraction of the raw reads, the method is fast and scalable to large datasets. This approach has been tested on plasmids, because of their contribution to important pathogen associated traits, such as AMR, hence the name, but there is no reason why this approach cannot be utilized for any MGE that can move independently through a bacterial population. PlasmidTron is written in Python 3 and available under the open source licence GNU GPL3 from https://github.com/sanger-pathogens/plasmidtron.DATA SUMMARYSource code for PlasmidTron is available from Github under the open source licence GNU GPL 3; (url - https://goo.gl/ot6rT5)Simulated raw reads files have been deposited in Figshare; (url - https://doi.org/10.6084/m9.figshare.5406355.vl)Salmonella enterica serovar Weltevreden strain VNS10259 is available from GenBank; accession number GCA_001409135.Salmonella enterica serovar Typhi strain BL60006 is available from GenBank; accession number GCA_900185485.Accession numbers for all of the Illumina datasets used in this paper are listed in the supplementary tables.I/We confirm all supporting data, code and protocols have been provided within the article or through supplementary data files. ⊠IMPACT STATEMENTPlasmidTron utilises the phenotypic data normally available in bacterial population studies, such as antibiograms, virulence factors, or geographic information, to identify sequences that are likely to represent MGEs linked to the phenotype.</jats:sec
    • …
    corecore