219 research outputs found

    Germline mutations in the oncogene EZH2 cause Weaver syndrome and increased human height.

    Get PDF
    The biological processes controlling human growth are diverse, complex and poorly understood. Genetic factors are important and human height has been shown to be a highly polygenic trait to which common and rare genetic variation contributes. Weaver syndrome is a human overgrowth condition characterised by tall stature, dysmorphic facial features, learning disability and variable additional features. We performed exome sequencing in four individuals with Weaver syndrome, identifying a mutation in the histone methyltransferase, EZH2, in each case. Sequencing of EZH2 in additional individuals with overgrowth identified a further 15 mutations. The EZH2 mutation spectrum in Weaver syndrome shows considerable overlap with the inactivating somatic EZH2 mutations recently reported in myeloid malignancies. Our data establish EZH2 mutations as the cause of Weaver syndrome and provide further links between histone modifications and regulation of human growth

    ICR142 Benchmarker: evaluating, optimising and benchmarking variant calling performance using the ICR142 NGS validation series.

    Get PDF
    Evaluating, optimising and benchmarking of next generation sequencing (NGS) variant calling performance are essential requirements for clinical, commercial and academic NGS pipelines. Such assessments should be performed in a consistent, transparent and reproducible fashion, using independently, orthogonally generated data. Here we present ICR142 Benchmarker, a tool to generate outputs for assessing germline base substitution and indel calling performance using the ICR142 NGS validation series, a dataset of Illumina platform-based exome sequence data from 142 samples together with Sanger sequence data at 704 sites. ICR142 Benchmarker provides summary and detailed information on the sensitivity, specificity and false detection rates of variant callers. ICR142 Benchmarker also automatically generates a single page report highlighting key performance metrics and how performance compares to widely-used open-source tools. We used ICR142 Benchmarker with VCF files outputted by GATK, OpEx and DeepVariant to create a benchmark for variant calling performance. This evaluation revealed pipeline-specific differences and shared challenges in variant calling, for example in detecting indels in short repeating sequence motifs. We next used ICR142 Benchmarker to perform regression testing with DeepVariant versions 0.5.2 and 0.6.1. This showed that v0.6.1 improves variant calling performance, but there was evidence of minor changes in indel calling behaviour that may benefit from attention. The data also allowed us to evaluate filters to optimise DeepVariant calling, and we recommend using 30 as the QUAL threshold for base substitution calls when using DeepVariant v0.6.1. Finally, we used ICR142 Benchmarker with VCF files from two commercial variant calling providers to facilitate optimisation of their in-house pipelines and to provide transparent benchmarking of their performance. ICR142 Benchmarker consistently and transparently analyses variant calling performance based on the ICR142 NGS validation series, using the standard VCF input and outputting informative metrics to enable user understanding of pipeline performance. ICR142 Benchmarker is freely available at https://github.com/RahmanTeamDevelopment/ICR142_Benchmarker/releases.This article is freely available online from the publisher's site via Open Access

    OpEx - a validated, automated pipeline optimised for clinical exome sequence analysis.

    Get PDF
    We present an easy-to-use, open-source Optimised Exome analysis tool, OpEx (http://icr.ac.uk/opex) that accurately detects small-scale variation, including indels, to clinical standards. We evaluated OpEx performance with an experimentally validated dataset (the ICR142 NGS validation series), a large 1000 exome dataset (the ICR1000 UK exome series), and a clinical proband-parent trio dataset. The performance of OpEx for high-quality base substitutions and short indels in both small and large datasets is excellent, with overall sensitivity of 95%, specificity of 97% and low false detection rate (FDR) of 3%. Depending on the individual performance requirements the OpEx output allows one to optimise the inevitable trade-offs between sensitivity and specificity. For example, in the clinical setting one could permit a higher FDR and lower specificity to maximise sensitivity. In contexts where experimental validation is not possible, minimising the FDR and improving specificity may be a preferable trade-off for slightly lower sensitivity. OpEx is simple to install and use; the whole pipeline is run from a single command. OpEx is therefore well suited to the increasing research and clinical laboratories undertaking exome sequencing, particularly those without in-house dedicated bioinformatics expertise

    CoverView: a sequence quality evaluation tool for next generation sequencing data.

    Get PDF
    Quality assurance and quality control are essential for robust next generation sequencing (NGS). Here we present CoverView, a fast, flexible, user-friendly quality evaluation tool for NGS data. CoverView processes mapped sequencing reads and user-specified regions to report depth of coverage, base and mapping quality metrics with increasing levels of detail from a chromosome-level summary to per-base profiles. CoverView can flag regions that do not fulfil user-specified quality requirements, allowing suboptimal data to be systematically and automatically presented for review. It also provides an interactive graphical user interface (GUI) that can be opened in a web browser and allows intuitive exploration of results. We have integrated CoverView into our accredited clinical cancer predisposition gene testing laboratory that uses the TruSight Cancer Panel (TSCP). CoverView has been invaluable for optimisation and quality control of our testing pipeline, providing transparent, consistent quality metric information and automatic flagging of regions that fall below quality thresholds. We demonstrate this utility with TSCP data from the Genome in a Bottle reference sample, which CoverView analysed in 13 seconds. CoverView uses data routinely generated by NGS pipelines, reads standard input formats, and rapidly creates easy-to-parse output text (.txt) files that are customised by a simple configuration file. CoverView can therefore be easily integrated into any NGS pipeline. CoverView and detailed documentation for its use are freely available at github.com/RahmanTeamDevelopment/CoverView/releases and www.icr.ac.uk/CoverView

    The Quality Sequencing Minimum (QSM): providing comprehensive, consistent, transparent next generation sequencing  data quality assurance.

    Get PDF
    Next generation sequencing (NGS) is routinely used in clinical genetic testing. Quality management of NGS testing is essential to ensure performance is consistently and rigorously evaluated. Three primary metrics are used in NGS quality evaluation: depth of coverage, base quality and mapping quality. To provide consistency and transparency in the utilisation of these metrics we present the Quality Sequencing Minimum (QSM). The QSM defines the minimum quality requirement a laboratory has selected for depth of coverage (C), base quality (B) and mapping quality (M) and can be applied per base, exon, gene or other genomic region, as appropriate. The QSM format is CX_BY(P Y)_MZ(P Z). X is the parameter threshold for C, Y the parameter threshold for B, P Y the percentage of reads that must reach Y, Z the parameter threshold for M, P Z the percentage of reads that must reach Z. The data underlying the QSM is in the BAM file, so a QSM can be easily and automatically calculated in any NGS pipeline. We used the QSM to optimise cancer predisposition gene testing using the TruSight Cancer Panel (TSCP). We set the QSM as C50_B10(85)_M20(95). Test regions falling below the QSM were automatically flagged for review, with 100/1471 test regions QSM-flagged in multiple individuals. Supplementing these regions with 132 additional probes improved performance in 85/100. We also used the QSM to optimise testing of genes with pseudogenes such as PTEN and PMS2. In TSCP data from 960 individuals the median number of regions that passed QSM per sample was 1429 (97%).  Importantly, the QSM can be used at an individual report level to provide succinct, comprehensive quality assurance information about individual test performance. We believe many laboratories would find the QSM useful. Furthermore, widespread adoption of the QSM would facilitate consistent, transparent reporting of genetic test performance by different laboratories

    The ICR96 exon CNV validation series: a resource for orthogonal assessment of exon CNV calling in NGS data.

    Get PDF
    Detection of deletions and duplications of whole exons (exon CNVs) is a key requirement of genetic testing. Accurate detection of this variant type has proved very challenging in targeted next-generation sequencing (NGS) data, particularly if only a single exon is involved. Many different NGS exon CNV calling methods have been developed over the last five years. Such methods are usually evaluated using simulated and/or in-house data due to a lack of publicly-available datasets with orthogonally generated results. This hinders tool comparisons, transparency and reproducibility. To provide a community resource for assessment of exon CNV calling methods in targeted NGS data, we here present the ICR96 exon CNV validation series. The dataset includes high-quality sequencing data from a targeted NGS assay (the TruSight Cancer Panel) together with Multiplex Ligation-dependent Probe Amplification (MLPA) results for 96 independent samples. 66 samples contain at least one validated exon CNV and 30 samples have validated negative results for exon CNVs in 26 genes. The dataset includes 46 exon CNVs in BRCA1, BRCA2, TP53, MLH1, MSH2, MSH6, PMS2, EPCAM or PTEN, giving excellent representation of the cancer predisposition genes most frequently tested in clinical practice. Moreover, the validated exon CNVs include 25 single exon CNVs, the most difficult type of exon CNV to detect. The FASTQ files for the ICR96 exon CNV validation series can be accessed through the European-Genome phenome Archive (EGA) under the accession number EGAS00001002428

    CSN and CAVA: variant annotation tools for rapid, robust next-generation sequencing analysis in the clinical setting.

    Get PDF
    Next-generation sequencing (NGS) offers unprecedented opportunities to expand clinical genomics. It also presents challenges with respect to integration with data from other sequencing methods and historical data. Provision of consistent, clinically applicable variant annotation of NGS data has proved difficult, particularly of indels, an important variant class in clinical genomics. Annotation in relation to a reference genome sequence, the DNA strand of coding transcripts and potential alternative variant representations has not been well addressed. Here we present tools that address these challenges to provide rapid, standardized, clinically appropriate annotation of NGS data in line with existing clinical standards.We developed a clinical sequencing nomenclature (CSN), a fixed variant annotation consistent with the principles of the Human Genome Variation Society (HGVS) guidelines, optimized for automated variant annotation of NGS data. To deliver high-throughput CSN annotation we created CAVA (Clinical Annotation of VAriants), a fast, lightweight tool designed for easy incorporation into NGS pipelines. CAVA allows transcript specification, appropriately accommodates the strand of a gene transcript and flags variants with alternative annotations to facilitate clinical interpretation and comparison with other datasets. We evaluated CAVA in exome data and a clinical BRCA1/BRCA2 gene testing pipeline.CAVA generated CSN calls for 10,313,034 variants in the ExAC database in 13.44 hours, and annotated the ICR1000 exome series in 6.5 hours. Evaluation of 731 different indels from a single individual revealed 92 % had alternative representations in left aligned and right aligned data. Annotation of left aligned data, as performed by many annotation tools, would thus give clinically discrepant annotation for the 339 (46 %) indels in genes transcribed from the forward DNA strand. By contrast, CAVA provides the correct clinical annotation for all indels. CAVA also flagged the 370 indels with alternative representations of a different functional class, which may profoundly influence clinical interpretation. CAVA annotation of 50 BRCA1/BRCA2 gene mutations from a clinical pipeline gave 100 % concordance with Sanger data; only 8/25 BRCA2 mutations were correctly clinically annotated by other tools.CAVA is a freely available tool that provides rapid, robust, high-throughput clinical annotation of NGS data, using a standardized clinical sequencing nomenclature

    Rare germline variants in DNA repair genes and the angiogenesis pathway predispose prostate cancer patients to develop metastatic disease

    Get PDF
    Background Prostate cancer (PrCa) demonstrates a heterogeneous clinical presentation ranging from largely indolent to lethal. We sought to identify a signature of rare inherited variants that distinguishes between these two extreme phenotypes. Methods We sequenced germline whole exomes from 139 aggressive (metastatic, age of diagnosis < 60) and 141 non-aggressive (low clinical grade, age of diagnosis ≥60) PrCa cases. We conducted rare variant association analyses at gene and gene set levels using SKAT and Bayesian risk index techniques. GO term enrichment analysis was performed for genes with the highest differential burden of rare disruptive variants. Results Protein truncating variants (PTVs) in specific DNA repair genes were significantly overrepresented among patients with the aggressive phenotype, with BRCA2, ATM and NBN the most frequently mutated genes. Differential burden of rare variants was identified between metastatic and non-aggressive cases for several genes implicated in angiogenesis, conferring both deleterious and protective effects. Conclusions Inherited PTVs in several DNA repair genes distinguish aggressive from non-aggressive PrCa cases. Furthermore, inherited variants in genes with roles in angiogenesis may be potential predictors for risk of metastases. If validated in a larger dataset, these findings have potential for future clinical application

    Implementing rapid, robust, cost-effective, patient-centred, routine genetic testing in ovarian cancer patients.

    Get PDF
    Advances in DNA sequencing have made genetic testing fast and affordable, but limitations of testing processes are impeding realisation of patient benefits. Ovarian cancer exemplifies the potential value of genetic testing and the shortcomings of current pathways to access testing. Approximately 15% of ovarian cancer patients have a germline BRCA1 or BRCA2 mutation which has substantial implications for their personal management and that of their relatives. Unfortunately, in most countries, routine implementation of BRCA testing for ovarian cancer patients has been inconsistent and largely unsuccessful. We developed a rapid, robust, mainstream genetic testing pathway in which testing is undertaken by the trained cancer team with cascade testing to relatives performed by the genetics team. 207 women with ovarian cancer were offered testing through the mainstream pathway. All accepted. 33 (16%) had a BRCA mutation. The result informed management of 79% (121/154) women with active disease. Patient and clinician feedback was very positive. The pathway offers a 4-fold reduction in time and 13-fold reduction in resource requirement compared to the conventional testing pathway. The mainstream genetic testing pathway we present is effective, efficient and patient-centred. It can deliver rapid, robust, large-scale, cost-effective genetic testing of BRCA1 and BRCA2 and may serve as an exemplar for other genes and other diseases

    Development of SNP markers present in expressed genes of the plant-pathogen interaction: Theobroma cacao - Moniliophtora perniciosa

    Get PDF
    We report the detection, validation and analysis of SNPs in the plant-pathogen interaction between cacao and Moniliophthora perniciosa ESTs using resequencing. This analysis in 73 EST sequences allowed the identification of 185 SNPs, 57% of them corresponding to transversion, 29% to transition and 14% to indels. The ESTs containing SNPs were classified into 14 main functional categories. After validation, 91 SNPs were confirmed, categorized and the parameters of nucleotide diversity and haplotype were calculated. Haplotype-based gene diversity and polymorphic information content (PIC) ranged from 0.559 to 0.56 and 0.115 to 0.12; respectively. Also, it was the advantage when considering haplotypes structure for each locus in place of single SNPs. Most of the gene fragments had a major haplotype combined to a series of low frequency haplotypes. Thus, the re-sequencing approach proved to be a valuable resource to identify useful SNPs for wide genetic applications. Furthermore, the cacao genome sequence availability allow a positional selection of DNA fragments to be re-sequenced enhancing the usefulness of the discovered SNPs. These results indicate the potential use of SNPs markers to identify allelic status of cacao resistance genes through marker-assisted selection to support the development of promising genotypes with high resistance to witch's broom disease. (Résumé d'auteur
    • …
    corecore