Search CORE

The Francis Crick Institute

REAPR: a universal tool for genome assembly evaluation.

Author: Berriman Matthew
Hunt Martin
Kikuchi Taisei
Newbold Chris
Otto Thomas D
Sanders Mandy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/

BamView: visualizing and interpretation of next-generation sequencing read alignments.

Author: Berriman Matthew
Carver Tim
Harris Simon R.
McQuillan Jacqueline A.
Otto Thomas D.
Parkhill Julian
Publication venue: 'Oxford University Press (OUP)'
Publication date: 16/01/2012
Field of study

So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790-6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user

Circlator: automated circularization of genome assemblies using long sequencing reads

Author: Harris Simon R.
Hunt Martin
Keane Jacqueline A.
Otto Thomas D.
Parkhill Julian
Silva Nishadi De
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/

CiteSeerX

arXiv.org e-Print Archive

Partial nonlinear reciprocity breaking through ultrafast dynamics in a random photonic medium

Author: E. Akkermans
E. D. Palik
Otto L. Muskens
Paul Venn
Thomas Wellens
Timmo van der Beek
Publication venue: 'American Physical Society (APS)'
Publication date: 02/05/2012
Field of study

We demonstrate that ultrafast nonlinear dynamics gives rise to reciprocity breaking in a random photonic medium. Reciprocity breaking is observed via the suppression of coherent backscattering, a manifestation of weak localization of light. The effect is observed in a pump-probe configuration where the pump induces an ultrafast step-change of the refractive index during the dwell time of the probe light in the material. The dynamical suppression of coherent backscattering is reproduced well by a multiple scattering Monte Carlo simulation. Ultrafast reciprocity breaking provides a distinct mechanism in nonlinear optical media which opens up avenues for the active manipulation of mesoscopic transport, random lasers, and photon localization.Comment: 5 pages, 4 figure

Southampton (e-Prints Soton)

Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps

Author: Isheng J Tsai
Matthew Berriman
Thomas D Otto
Publication venue: Springer Nature
Publication date: 01/01/2010
Field of study

Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data

A comprehensive evaluation of assembly scaffolding tools

Author: Chris Newbold
Martin Hunt
Matthew Berriman
Thomas D Otto
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RATT: Rapid Annotation Transfer Tool

Author: Berriman Matthew
Degrave Wim S.
Dillon Gary P.
Otto Thomas D.
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net

Quantitative insertion-site sequencing (QIseq) for high throughput phenotyping of transposon mutants

Author: Adams John H.
Bronner Iraad F.
Jiang Rays H. Y.
Otto Thomas D.
Quail Michael A.
Rayner Julian C.
Udenze Kenneth
Wang Chengqi
Zhang Min
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 10/05/2016
Field of study

Genetic screening using random transposon insertions has been a powerful tool for uncovering biology in prokaryotes, where whole-genome saturating screens have been performed in multiple organisms. In eukaryotes, such screens have proven more problematic, in part because of the lack of a sensitive and robust system for identifying transposon insertion sites. We here describe quantitative insertion-site sequencing, or QIseq, which uses custom library preparation and Illumina sequencing technology and is able to identify insertion sites from both the 5' and 3' ends of the transposon, providing an inbuilt level of validation. The approach was developed using piggyBac mutants in the human malaria parasite Plasmodium falciparum but should be applicable to many other eukaryotic genomes. QIseq proved accurate, confirming known sites in >100 mutants, and sensitive, identifying and monitoring sites over a >10,000-fold dynamic range of sequence counts. Applying QIseq to uncloned parasites shortly after transfections revealed multiple insertions in mixed populations and suggests that >4000 independent mutants could be generated from relatively modest scales of transfection, providing a clear pathway to genome-scale screens in P. falciparum QIseq was also used to monitor the growth of pools of previously cloned mutants and reproducibly differentiated between deleterious and neutral mutations in competitive growth. Among the mutants with fitness defects was a mutant with a piggyBac insertion immediately upstream of the kelch protein K13 gene associated with artemisinin resistance, implying mutants in this gene may have competitive fitness costs. QIseq has the potential to enable the scale-up of piggyBac-mediated genetics across multiple eukaryotic systems

Progression of the canonical reference malaria parasite genome from 2002–2019

Author: Berriman Matthew
Böhme Ulrike
Newbold Chris I.
Otto Thomas D.
Sanders Mandy
Publication venue: 'F1000 Research Ltd'
Publication date: 29/03/2019
Field of study

Here we describe the ways in which the sequence and annotation of the Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpretating intra-species sequence diversity. We therefore describe in this publication a population reference genome of P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. P. falciparum 3D7 will be continued to be maintained with ongoing curation ensuring continual improvements in annotation quality