469 research outputs found
Benchmarking of long-read assemblers for prokaryote whole genome sequencing.
Background: Data sets from long-read sequencing platforms (Oxford Nanopore Technologies and Pacific Biosciences) allow for most prokaryote genomes to be completely assembled - one contig per chromosome or plasmid. However, the high per-read error rate of long-read sequencing necessitates different approaches to assembly than those used for short-read sequencing. Multiple assembly tools (assemblers) exist, which use a variety of algorithms for long-read assembly. Methods: We used 500 simulated read sets and 120 real read sets to assess the performance of eight long-read assemblers (Canu, Flye, Miniasm/Minipolish, NECAT, NextDenovo/NextPolish, Raven, Redbean and Shasta) across a wide variety of genomes and read parameters. Assemblies were assessed on their structural accuracy/completeness, sequence identity, contig circularisation and computational resources used. Results: Canu v2.1 produced reliable assemblies and was good with plasmids, but it performed poorly with circularisation and had the longest runtimes of all assemblers tested. Flye v2.8 was also reliable and made the smallest sequence errors, though it used the most RAM. Miniasm/Minipolish v0.3/v0.1.3 was the most likely to produce clean contig circularisation. NECAT v20200803 was reliable and good at circularisation but tended to make larger sequence errors. NextDenovo/NextPolish v2.3.1/v1.3.1 was reliable with chromosome assembly but bad with plasmid assembly. Raven v1.3.0 was reliable for chromosome assembly, though it did not perform well on small plasmids and had circularisation issues. Redbean v2.5 and Shasta v0.7.0 were computationally efficient but more likely to produce incomplete assemblies. Conclusions: Of the assemblers tested, Flye, Miniasm/Minipolish, NextDenovo/NextPolish and Raven performed best overall. However, no single tool performed well on all metrics, highlighting the need for continued development on long-read assembly algorithms
ModuleFinder and CoReg: alternative tools for linking gene expression modules with promoter sequences motifs to uncover gene regulation mechanisms in plants
BACKGROUND: Uncovering the key sequence elements in gene promoters that regulate the expression of plant genomes is a huge task that will require a series of complementary methods for prediction, substantial innovations in experimental validation and a much greater understanding of the role of combinatorial control in the regulation of plant gene expression. RESULTS: To add to this larger process and to provide alternatives to existing prediction methods, we have developed several tools in the statistical package R. ModuleFinder identifies sets of genes and treatments that we have found to form valuable sets for analysis of the mechanisms underlying gene co-expression. CoReg then links the hierarchical clustering of these co-expressed sets with frequency tables of promoter elements. These promoter elements can be drawn from known elements or all possible combinations of nucleotides in an element of various lengths. These sets of promoter elements represent putative cis-acting regulatory elements common to sets of co-expressed genes and can be prioritised for experimental testing. We have used these new tools to analyze the response of transcripts for nuclear genes encoding mitochondrial proteins in Arabidopsis to a range of chemical stresses. ModuleFinder provided a subset of co-expressed gene modules that are more logically related to biological functions than did subsets derived from traditional hierarchical clustering techniques. Importantly ModuleFinder linked responses in transcripts for electron transport chain components, carbon metabolism enzymes and solute transporter proteins. CoReg identified several promoter motifs that helped to explain the patterns of expression observed. CONCLUSION: ModuleFinder identifies sets of genes and treatments that form useful sets for analysis of the mechanisms behind co-expression. CoReg links the clustering tree of expression-based relationships in these sets with frequency tables of promoter elements. These sets of promoter elements represent putative cis-acting regulatory elements for sets of genes, and can then be tested experimentally. We consider these tools, both built on an open source software product to provide valuable, alternative tools for the prioritisation of promoter elements for experimental analysis
A Type 2 A/C2 plasmid carrying the aacC4 apramycin resistance gene and the erm(42) erythromycin resistance gene recovered from two Salmonella enterica serovars
Objective: To determine the relationships between RepA/C2 plasmids carrying several antibiotic resistance genes found in isolates of Salmonella enterica serovars Ohio and Senftenberg from pigs. Methods: Illumina HiSeq was used to sequence seven S. enterica isolates. BLAST searches identified relevant A/C2 plasmid contigs, and contigs were assembled using PCR. Results: Two serovar Ohio isolates were ST329 and the five Senftenberg isolates were ST210. The A/C2 plasmids recovered from the seven isolates belong to Type 2 and contain two resistance islands. Their backbones were closely related, differing by five or fewer single nucleotide polymorphisms. The sul2-containing resistance island ARI-B is 19.9 kb and also contains the kanamycin and neomycin resistance gene aphA1, the tetracycline resistance gene tetA(D), and an erythromycin resistance gene, erm(42), not previously seen in A/C2 plasmids. A second 30.3 kb resistance island, RI-119, is in a unique location in the A/C2 backbone 8.2 kb downstream of rhs. RI-119 contained genes conferring resistance to apramycin, netilmicin, tobramycin (aacC4), hygromycin (hph), sulphonamides (sul1) and spectinomycin and streptomycin (aadA2). In one of the seven plasmids, this resistance region contained two IS26-mediated deletions. A discrete 5.7 kb segment containing the aacC4 and hph genes and bounded by IS26 on one side and the IR of Tn5393 on the other was identified. Conclusions: The presence of almost identical A/C2 plasmids in two serovars indicates a common origin. Type 2 A/C2 plasmids continue to evolve via addition of new resistance regions such as RI-119 and evolution of existing ones
A Type 2 A/C2 plasmid carrying the aacC4 apramycin resistance gene and the erm(42) erythromycin resistance gene recovered from two Salmonella enterica serovars
Objective: To determine the relationships between RepA/C2 plasmids carrying several antibiotic resistance genes found in isolates of Salmonella enterica serovars Ohio and Senftenberg from pigs.
Methods: Illumina HiSeq was used to sequence seven S. enterica isolates. BLAST searches identified relevant A/C2 plasmid contigs, and contigs were assembled using PCR.
Results: Two serovar Ohio isolates were ST329 and the five Senftenberg isolates were ST210. The A/C2 plasmids recovered from the seven isolates belong to Type 2 and contain two resistance islands. Their backbones were closely related, differing by five or fewer single nucleotide polymorphisms. The sul2-containing resistance island ARI-B is 19.9 kb and also contains the kanamycin and neomycin resistance gene aphA1, the tetracycline resistance gene tetA(D), and an erythromycin resistance gene, erm(42), not previously seen in A/C2 plasmids. A second 30.3 kb resistance island, RI-119, is in a unique location in the A/C2 backbone 8.2 kb downstream of rhs. RI-119 contained genes conferring resistance to apramycin, netilmicin, tobramycin (aacC4), hygromycin (hph), sulphonamides (sul1) and spectinomycin and streptomycin (aadA2). In one of the seven plasmids, this resistance region contained two IS26-mediated deletions. A discrete 5.7 kb segment containing the aacC4 and hph genes and bounded by IS26 on one side and the IR of Tn5393 on the other was identified.
Conclusions: The presence of almost identical A/C2 plasmids in two serovars indicates a common origin. Type 2 A/C2 plasmids continue to evolve via addition of new resistance regions such as RI-119 and evolution of existing ones
FastSpar: rapid and scalable correlation estimation for compositional data.
SUMMARY: A common goal of microbiome studies is the elucidation of community composition and member interactions using counts of taxonomic units extracted from sequence data. Inference of interaction networks from sparse and compositional data requires specialized statistical approaches. A popular solution is SparCC, however its performance limits the calculation of interaction networks for very high-dimensional datasets. Here we introduce FastSpar, an efficient and parallelizable implementation of the SparCC algorithm which rapidly infers correlation networks and calculates P-values using an unbiased estimator. We further demonstrate that FastSpar reduces network inference wall time by 2-3 orders of magnitude compared to SparCC. AVAILABILITY AND IMPLEMENTATION: FastSpar source code, precompiled binaries and platform packages are freely available on GitHub: github.com/scwatts/FastSpar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
hicap: In Silico Serotyping of the Haemophilus influenzae Capsule Locus.
Haemophilus influenzae exclusively colonizes the human nasopharynx and can cause a variety of respiratory infections as well as invasive diseases, including meningitis and sepsis. A key virulence determinant of H. influenzae is the polysaccharide capsule, of which six serotypes are known, each encoded by a distinct variation of the capsule biosynthesis locus (cap-a to cap-f). H. influenzae type b (Hib) was historically responsible for the majority of invasive H. influenzae disease, and its prevalence has been markedly reduced in countries that have implemented vaccination programs targeting this serotype. In the postvaccine era, nontypeable H. influenzae emerged as the most dominant group causing disease, but in recent years a resurgence of encapsulated H. influenzae strains has also been observed, most notably serotype a. Given the increasing incidence of encapsulated strains and the high frequency of Hib in countries without vaccination programs, there is growing interest in genomic epidemiology of H. influenzae Here we present hicap, a software tool for rapid in silico serotype prediction from H. influenzae genome sequences. hicap is written using Python3 and is freely available at https://github.com/scwatts/hicap under the GNU General Public License v3 (GPL3). To demonstrate the utility of hicap, we used it to investigate the cap locus diversity and distribution in 691 high-quality H. influenzae genomes from GenBank. These analyses identified cap loci in 95 genomes and confirmed the general association of each serotype with a unique clonal lineage, and they also identified occasional recombination between lineages that gave rise to hybrid cap loci (2% of encapsulated strains)
The impact of genomics on precision public health: beyond the pandemic.
Precision public health has been defined in many ways. It can be viewed as an emerging multidisciplinary field that uses genomics, big data, and machine learning/artificial intelligence to predict health risks and outcomes and to improve health at the population level. Just like precision medicine seeks to provide the right intervention to the right patient at the right time, the aim of precision public health is to provide the right intervention to the right population at the right time, with the goal of improving health for all. Genomic technologies have been at the leading edge of applications in clinical medicine and have the potential to revolutionize public health. We are pleased to introduce this special issue of Genome Medicine on the impact of genomics on precision public health, which highlights the utility of genomic tools in public health research and practice in the fight against communicable and noncommunicable diseases. This is particularly timely, given the battle against the COVID-19 pandemic, which has necessitated the application of genomic approaches to track the origin, transmission and evolution of the SARS-CoV-2 virus globally, as well as to understand differential host susceptibility, response, severity, and outcomes. Beyond genomics, granular data from population surveillance approaches are being used to target public health interventions. In addition, big data, digital technologies, and mobile health applications have been instrumental in defining the natural history of COVID-19 and identifying prognostic factors through machine learning and artificial intelligence
Five Years of GenoTyphi: Updates to the Global Salmonella Typhi Genotyping Framework.
In 2016, a whole-genome sequence (WGS)-based genotyping framework (GenoTyphi) was developed and provided a phylogenetically informative nomenclature for lineages of Salmonella Typhi, the etiological agent of typhoid fever. Subsequent surveillance studies have revealed additional epidemiologically important subpopulations, which require the definition of new genotypes and extension of associated software to facilitate the detection of antimicrobial resistance (AMR) mutations. Analysis of 4632 WGS provide an updated overview of the global S Typhi population structure and genotyping framework, revealing the widespread nature of haplotype 58 ([H58] 4.3.1) genotypes and the diverse range of genotypes carrying AMR mutations
Klebsiella pneumoniae Population Genomics and Antimicrobial-Resistant Clones.
Antimicrobial-resistant Klebsiella pneumoniae (Kp) has emerged as a major global public health problem. While resistance can occur across a broad range of Kp clones, a small number have become globally distributed and commonly cause outbreaks in hospital settings. Here we describe recent comparative genomics investigations that have shed light on Kp population structure and the evolution of antimicrobial-resistant clones. These studies provide the basic framework within which genomic epidemiology and evolution can be understood, but have merely scratched the surface of what can and should be explored. We assert that further large-scale comparative and functional genomics studies are urgently needed to better understand the biology of this clinically important bacterium
- …