Search CORE

The University of Manchester - Institutional Repository

King's Research Portal

St George's Online Research Archive

University of Melbourne Institutional Repository

ICR142 Benchmarker: evaluating, optimising and benchmarking variant calling performance using the ICR142 NGS validation series.

Author: A Rimmer
D Koboldt
D Li
E Holt
E Ruark
E Ruark
H Fang
K Stals
K Ushey
M DePristo
M Munz
N Rahman
P Danecek
R Poplin
S Roy
S Sandmann
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2018
Field of study

Evaluating, optimising and benchmarking of next generation sequencing (NGS) variant calling performance are essential requirements for clinical, commercial and academic NGS pipelines. Such assessments should be performed in a consistent, transparent and reproducible fashion, using independently, orthogonally generated data. Here we present ICR142 Benchmarker, a tool to generate outputs for assessing germline base substitution and indel calling performance using the ICR142 NGS validation series, a dataset of Illumina platform-based exome sequence data from 142 samples together with Sanger sequence data at 704 sites. ICR142 Benchmarker provides summary and detailed information on the sensitivity, specificity and false detection rates of variant callers. ICR142 Benchmarker also automatically generates a single page report highlighting key performance metrics and how performance compares to widely-used open-source tools. We used ICR142 Benchmarker with VCF files outputted by GATK, OpEx and DeepVariant to create a benchmark for variant calling performance. This evaluation revealed pipeline-specific differences and shared challenges in variant calling, for example in detecting indels in short repeating sequence motifs. We next used ICR142 Benchmarker to perform regression testing with DeepVariant versions 0.5.2 and 0.6.1. This showed that v0.6.1 improves variant calling performance, but there was evidence of minor changes in indel calling behaviour that may benefit from attention. The data also allowed us to evaluate filters to optimise DeepVariant calling, and we recommend using 30 as the QUAL threshold for base substitution calls when using DeepVariant v0.6.1. Finally, we used ICR142 Benchmarker with VCF files from two commercial variant calling providers to facilitate optimisation of their in-house pipelines and to provide transparent benchmarking of their performance. ICR142 Benchmarker consistently and transparently analyses variant calling performance based on the ICR142 NGS validation series, using the standard VCF input and outputting informative metrics to enable user understanding of pipeline performance. ICR142 Benchmarker is freely available at https://github.com/RahmanTeamDevelopment/ICR142_Benchmarker/releases.This article is freely available online from the publisher's site via Open Access

RD&E Research Repository

Directory of Open Access Journals

OpEx - a validated, automated pipeline optimised for clinical exome sequence analysis.

Author: Clarke M
Elliott A
Lunter G
Münz M
Rahman N
Ramsay E
Renwick A
Ruark E
Seal S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

We present an easy-to-use, open-source Optimised Exome analysis tool, OpEx (http://icr.ac.uk/opex) that accurately detects small-scale variation, including indels, to clinical standards. We evaluated OpEx performance with an experimentally validated dataset (the ICR142 NGS validation series), a large 1000 exome dataset (the ICR1000 UK exome series), and a clinical proband-parent trio dataset. The performance of OpEx for high-quality base substitutions and short indels in both small and large datasets is excellent, with overall sensitivity of 95%, specificity of 97% and low false detection rate (FDR) of 3%. Depending on the individual performance requirements the OpEx output allows one to optimise the inevitable trade-offs between sensitivity and specificity. For example, in the clinical setting one could permit a higher FDR and lower specificity to maximise sensitivity. In contexts where experimental validation is not possible, minimising the FDR and improving specificity may be a preferable trade-off for slightly lower sensitivity. OpEx is simple to install and use; the whole pipeline is run from a single command. OpEx is therefore well suited to the increasing research and clinical laboratories undertaking exome sequencing, particularly those without in-house dedicated bioinformatics expertise

Oxford University Research Archive

CoverView: a sequence quality evaluation tool for next generation sequencing data.

Author: Mahamdallie S
Münz M
Poyastro-Pearson E
Rahman N
Rimmer A
Ruark E
Seal S
Strydom A
Yost S
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2018
Field of study

Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView

Directory of Open Access Journals

The Quality Sequencing Minimum (QSM): providing comprehensive, consistent, transparent next generation sequencing data quality assurance.

Author: Mahamdallie S
Münz M
Poyastro-Pearson E
Rahman N
Renwick A
Ruark E
Seal S
Strydom A
Yost S
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2018
Field of study

Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P Y)_MZ(P Z). X is the parameter threshold for C, Y the parameter threshold for B, P Y the percentage of reads that must reach Y, Z the parameter threshold for M, P Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as PTEN and PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%). Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories

Directory of Open Access Journals

The ICR96 exon CNV validation series: a resource for orthogonal assessment of exon CNV calling in NGS data.

Author: Elliott A
Mahamdallie S
Rahman N
Ramsay E
Renwick A
Ruark E
Seal S
Strydom A
Uddin I
Wylie H
Yost S
Publication venue: 'F1000 Research Ltd'
Publication date: 01/01/2017
Field of study

Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428

CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting.

Author: Clarke M
Cloke V
Lunter G
Mahamdallie S
Münz M
Rahman N
Ramsay E
Renwick A
Ruark E
Seal S
Strydom A
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2015
Field of study

Next-generation sequencing (NGS) offers unprecedented opportunities to expand clinical genomics. It also presents challenges with respect to integration with data from other sequencing methods and historical data. Provision of consistent, clinically applicable variant annotation of NGS data has proved difficult, particularly of indels, an important variant class in clinical genomics. Annotation in relation to a reference genome sequence, the DNA strand of coding transcripts and potential alternative variant representations has not been well addressed. Here we present tools that address these challenges to provide rapid, standardized, clinically appropriate annotation of NGS data in line with existing clinical standards.We developed a clinical sequencing nomenclature (CSN), a fixed variant annotation consistent with the principles of the Human Genome Variation Society (HGVS) guidelines, optimized for automated variant annotation of NGS data. To deliver high-throughput CSN annotation we created CAVA (Clinical Annotation of VAriants), a fast, lightweight tool designed for easy incorporation into NGS pipelines. CAVA allows transcript specification, appropriately accommodates the strand of a gene transcript and flags variants with alternative annotations to facilitate clinical interpretation and comparison with other datasets. We evaluated CAVA in exome data and a clinical BRCA1/BRCA2 gene testing pipeline.CAVA generated CSN calls for 10,313,034 variants in the ExAC database in 13.44 hours, and annotated the ICR1000 exome series in 6.5 hours. Evaluation of 731 different indels from a single individual revealed 92 % had alternative representations in left aligned and right aligned data. Annotation of left aligned data, as performed by many annotation tools, would thus give clinically discrepant annotation for the 339 (46 %) indels in genes transcribed from the forward DNA strand. By contrast, CAVA provides the correct clinical annotation for all indels. CAVA also flagged the 370 indels with alternative representations of a different functional class, which may profoundly influence clinical interpretation. CAVA annotation of 50 BRCA1/BRCA2 gene mutations from a clinical pipeline gave 100 % concordance with Sanger data; only 8/25 BRCA2 mutations were correctly clinically annotated by other tools.CAVA is a freely available tool that provides rapid, robust, high-throughput clinical annotation of NGS data, using a standardized clinical sequencing nomenclature

Springer - Publisher Connector

Rare germline variants in DNA repair genes and the angiogenesis pathway predispose prostate cancer patients to develop metastatic disease

Author: A Liberzon
A McKenna
AA Al Olama
AG Vinuesa de
AH Ramos
AJ Vickers
B Sharma
C Cybulski
C Cybulski
C Deisenroth
CC Pritchard
Christopher A. Haiman
Clara Cieza-Borrella
CM Ewing
CM Gay
D Leongamornlert
D Li
Daniel A. Leongamornlert
David V. Conti
DC Koboldt
E Castro
E Castro
E Ruark
Edward J. Saunders
FR Schumacher
G Jun
Ian Whitmore
J Karar
J Li
J Niewiarowska
JB Hjelmborg
JI Jun
JL Beebe-Dimmer
K Yumoto
Koveela Govindasami
L Finney
LA Mucci
M Kircher
M Lek
M Martin
M Mongiat
MA Quintana
MA Quintana
Mark N. Brook
Martina Mijuskovic
P Dell’Oglio
PC Sham
R Bhati
R Lozano
R Na
RA Eeles
RD Wood
Rosalind A. Eeles
S Carbon
S Lee
S Purcell
Sarah Wakerell
SI Cunha
SM Gogarten
SN Hart
T Walsh
T Wei
The 1000 Genomes Project Consortium.
Tokhir Dadaev
X Chang
X Liu
X Zhan
X Zheng
Y Gong
Z Kote-Jarai
Z Kote-Jarai
Zsofia Kote-Jarai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

Background Prostate cancer (PrCa) demonstrates a heterogeneous clinical presentation ranging from largely indolent to lethal. We sought to identify a signature of rare inherited variants that distinguishes between these two extreme phenotypes. Methods We sequenced germline whole exomes from 139 aggressive (metastatic, age of diagnosis < 60) and 141 non-aggressive (low clinical grade, age of diagnosis ≥60) PrCa cases. We conducted rare variant association analyses at gene and gene set levels using SKAT and Bayesian risk index techniques. GO term enrichment analysis was performed for genes with the highest differential burden of rare disruptive variants. Results Protein truncating variants (PTVs) in specific DNA repair genes were significantly overrepresented among patients with the aggressive phenotype, with BRCA2, ATM and NBN the most frequently mutated genes. Differential burden of rare variants was identified between metastatic and non-aggressive cases for several genes implicated in angiogenesis, conferring both deleterious and protective effects. Conclusions Inherited PTVs in several DNA repair genes distinguish aggressive from non-aggressive PrCa cases. Furthermore, inherited variants in genes with roles in angiogenesis may be potential predictors for risk of metastases. If validated in a larger dataset, these findings have potential for future clinical application

St George's Online Research Archive

Implementing rapid, robust, cost-effective, patient-centred, routine genetic testing in ovarian cancer patients.

Author: Banerjee S
Cloke V
George A
Gore M
Hanson H
Kemp Z
Mahamdallie S
Rahman N
Riddell D
Ruark E
Seal S
Slade I
Strydom A
Talukdar S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Advances in DNA sequencing have made genetic testing fast and affordable, but limitations of testing processes are impeding realisation of patient benefits. Ovarian cancer exemplifies the potential value of genetic testing and the shortcomings of current pathways to access testing. Approximately 15% of ovarian cancer patients have a germline BRCA1 or BRCA2 mutation which has substantial implications for their personal management and that of their relatives. Unfortunately, in most countries, routine implementation of BRCA testing for ovarian cancer patients has been inconsistent and largely unsuccessful. We developed a rapid, robust, mainstream genetic testing pathway in which testing is undertaken by the trained cancer team with cascade testing to relatives performed by the genetics team. 207 women with ovarian cancer were offered testing through the mainstream pathway. All accepted. 33 (16%) had a BRCA mutation. The result informed management of 79% (121/154) women with active disease. Patient and clinician feedback was very positive. The pathway offers a 4-fold reduction in time and 13-fold reduction in resource requirement compared to the conventional testing pathway. The mainstream genetic testing pathway we present is effective, efficient and patient-centred. It can deliver rapid, robust, large-scale, cost-effective genetic testing of BRCA1 and BRCA2 and may serve as an exemplar for other genes and other diseases

Oxford University Research Archive

Development of SNP markers present in expressed genes of the plant-pathogen interaction: Theobroma cacao - Moniliophtora perniciosa

Author: A Meindl
A Romero
Anthony Renwick
Antonis C Antoniou
C Loveday
Chey Loveday
Clare Turnbull
D Gareth Evans
Deborah Hughes
Diana Eccles
E Levy-Lahad
Elise Ruark
Emma Ramsay
ER Thompson
J Clague
Katie Snape
LM Pelttari
M Ahmed
M Vuorela
Margaret Warren-Perry
Martin Gore
MR Akbari
MW Wong
Nazneen Rahman
Rosa Maria Munoz Xicola
S Seal
Sheila Seal
V Silvestri
Y Zheng
Z Pang
Publication venue: 'Agrotropica'
Publication date: 26/04/2012
Field of study

We report the detection, validation and analysis of SNPs in the plant-pathogen interaction between cacao and Moniliophthora perniciosa ESTs using resequencing. This analysis in 73 EST sequences allowed the identification of 185 SNPs, 57% of them corresponding to transversion, 29% to transition and 14% to indels. The ESTs containing SNPs were classified into 14 main functional categories. After validation, 91 SNPs were confirmed, categorized and the parameters of nucleotide diversity and haplotype were calculated. Haplotype-based gene diversity and polymorphic information content (PIC) ranged from 0.559 to 0.56 and 0.115 to 0.12; respectively. Also, it was the advantage when considering haplotypes structure for each locus in place of single SNPs. Most of the gene fragments had a major haplotype combined to a series of low frequency haplotypes. Thus, the re-sequencing approach proved to be a valuable resource to identify useful SNPs for wide genetic applications. Furthermore, the cacao genome sequence availability allow a positional selection of DNA fragments to be re-sequenced enhancing the usefulness of the discovered SNPs. These results indicate the potential use of SNPs markers to identify allelic status of cacao resistance genes through marker-assisted selection to support the development of promising genotypes with high resistance to witch's broom disease. (Résumé d'auteur

Southampton (e-Prints Soton)

Agritrop

The University of Manchester - Institutional Repository