856 research outputs found

    Next generation sequencing in cancer: opportunities and challenges for precision cancer medicine

    Get PDF
    Over the past decade, testing the genes of patients and their specific cancer types has become standardized practice in medical oncology since somatic mutations, changes in gene expression and epigenetic modifications are all hallmarks of cancer. However, while cancer genetic assessment has been limited to single biomarkers to guide the use of therapies, improvements in nucleic acid sequencing technologies and implementation of different genome analysis tools have enabled clinicians to detect these genomic alterations and identify functional and disease-associated genomic variants. Next-generation sequencing (NGS) technologies have provided clues about therapeutic targets and genomic markers for novel clinical applications when standard therapy has failed. While Sanger sequencing, an accurate and sensitive approach, allows for the identification of potential novel variants, it is however limited by the single amplicon being interrogated. Similarly, quantitative and qualitative profiling of gene expression changes also represents a challenge for the cancer field. Both RT-PCR and microarrays are efficient approaches, but are limited to the genes present on the array or being assayed. This leaves vast swaths of the transcriptome, including non-coding RNAs and other features, unexplored. With the advent of the ability to collect and analyze genomic sequence data in a timely fashion and at an ever-decreasing cost, many of these limitations have been overcome and are being incorporated into cancer research and diagnostics giving patients and clinicians new hope for targeted and personalized treatment. Below we highlight the various applications of next-generation sequencing in precision cancer medicine

    Genetic screening and molecular characterisation of biomarkers in hepatocellular carcinoma

    Get PDF
    Hepatocellular carcinoma (HCC) is the most common type of liver cancer that accounts for 4.7% of the total number of new cases of cancer worldwide every year. HCC is a highly heterogeneous and complex disease with an estimated 5-year survival rate of only 18%. A better understanding of the mechanisms involved in the development, progression and recurrence of this tumour could not only guide us in the improvement of preventive strategies but also in the expansion of alternative target therapies for HCC patients. The aim of this thesis is to investigate new diagnostic and prognostic markers, both on genetic and molecular levels, in the context of HCC. The results section is divided in two, called Chapter I and Chapter II. HCC presents a distinct mutational landscape and Chapter I describes how we developed a HCC-specific custom made sequencing panel, containing the genes most commonly affected by somatic mutations and copy number alterations (CNAs) in the disease. We created a panel that was tested in different kinds of patient biopsies: frozen tissues, formalin-fixed paraffin-embedded (FFPE) tissues and also liquid biopsies. Moreover, to have reliable and reproducible sequencing data, we created a solid and user friendly somatic variant calling pipeline specific for Ion Torrent sequencing data. In Chapter II, we aimed to investigate the molecular mechanism of HMGA1 in HCC and to explore its molecular targets. HMGA1 is an architectural transcription factor that was found often overexpressed in HCCs. We explored its DNA-binding landscape and, after deregulating HMGA1 in a HCC in vitro environment, its expression signature both at the RNA and protein levels. With the analysis of the binding partners of HMGA1, we recognised the vast range of mechanisms of action of this complex protein. We identified several RNA regulators that bind HMGA1, including Alyref, which plays a role in the regulation of the transcription. Further work should aim to determine the non-canonical role of HMGA1 involved in the binding and the regulation not only at the DNA but also at the RNA level. Both chapters describe the steps of this work on the identification and the functional understanding of HCC biomarkers. This may lead in the future to more individualised treatment approaches, a need that in cancers with low survival rate such as HCC is not only highly desirable but is also a necessity

    Bioinformatics and computational tools for next-generation sequencing analysis in clinical genetics

    Get PDF
    Clinical genetics has an important role in the healthcare system to provide a definitive diagnosis for many rare syndromes. It also can have an influence over genetics prevention, disease prognosis and assisting the selection of the best options of care/treatment for patients. Next-generation sequencing (NGS) has transformed clinical genetics making possible to analyze hundreds of genes at an unprecedented speed and at a lower price when comparing to conventional Sanger sequencing. Despite the growing literature concerning NGS in a clinical setting, this review aims to fill the gap that exists among (bio)informaticians, molecular geneticists and clinicians, by presenting a general overview of the NGS technology and workflow. First, we will review the current NGS platforms, focusing on the two main platforms Illumina and Ion Torrent, and discussing the major strong points and weaknesses intrinsic to each platform. Next, the NGS analytical bioinformatic pipelines are dissected, giving some emphasis to the algorithms commonly used to generate process data and to analyze sequence variants. Finally, the main challenges around NGS bioinformatics are placed in perspective for future developments. Even with the huge achievements made in NGS technology and bioinformatics, further improvements in bioinformatic algorithms are still required to deal with complex and genetically heterogeneous disorders

    Bioinformatics for personal genomics: development and application of bioinformatic procedures for the analysis of genomic data

    Get PDF
    In the last decade, the huge decreasing of sequencing cost due to the development of high-throughput technologies completely changed the way for approaching the genetic problems. In particular, whole exome and whole genome sequencing are contributing to the extraordinary progress in the study of human variants opening up new perspectives in personalized medicine. Being a relatively new and fast developing field, appropriate tools and specialized knowledge are required for an efficient data production and analysis. In line with the times, in 2014, the University of Padua funded the BioInfoGen Strategic Project with the goal of developing technology and expertise in bioinformatics and molecular biology applied to personal genomics. The aim of my PhD was to contribute to this challenge by implementing a series of innovative tools and by applying them for investigating and possibly solving the case studies included into the project. I firstly developed an automated pipeline for dealing with Illumina data, able to sequentially perform each step necessary for passing from raw reads to somatic or germline variant detection. The system performance has been tested by means of internal controls and by its application on a cohort of patients affected by gastric cancer, obtaining interesting results. Once variants are called, they have to be annotated in order to define their properties such as the position at transcript and protein level, the impact on protein sequence, the pathogenicity and more. As most of the publicly available annotators were affected by systematic errors causing a low consistency in the final annotation, I implemented VarPred, a new tool for variant annotation, which guarantees the best accuracy (>99%) compared to the state-of-the-art programs, showing also good processing times. To make easy the use of VarPred, I equipped it with an intuitive web interface, that allows not only a graphical result evaluation, but also a simple filtration strategy. Furthermore, for a valuable user-driven prioritization of human genetic variations, I developed QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. The prioritization is achieved by a global positive selection process that promotes the emergence of the most reliable variants, rather than filtering out those not satisfying the applied criteria. QueryOR has been used to analyze the two case studies framed within the BioInfoGen project. In particular, it allowed to detect causative variants in patients affected by lysosomal storage diseases, highlighting also the efficacy of the designed sequencing panel. On the other hand, QueryOR simplified the recognition of LRP2 gene as possible candidate to explain such subjects with a Dent disease-like phenotype, but with no mutation in the previously identified disease-associated genes, CLCN5 and OCRL. As final corollary, an extensive analysis over recurrent exome variants was performed, showing that their origin can be mainly explained by inaccuracies in the reference genome, including misassembled regions and uncorrected bases, rather than by platform specific errors

    Standardization of sequencing coverage depth in NGS: Recommendation for detection of clonal and subclonal mutations in cancer diagnostics

    Get PDF
    The insufficient standardization of diagnostic next-generation sequencing (NGS) still limits its implementation in clinical practice, with the correct detection of mutations at low variant allele frequencies (VAF) facing particular challenges. We address here the standardization of sequencing coverage depth in order to minimize the probability of false positive and false negative results, the latter being underestimated in clinical NGS. There is currently no consensus on the minimum coverage depth, and so each laboratory has to set its own parameters. To assist laboratories with the determination of the minimum coverage parameters, we provide here a user-friendly coverage calculator. Using the sequencing error only, we recommend a minimum depth of coverage of 1,650 together with a threshold of at least 30 mutated reads for a targeted NGS mutation analysis of >= 3% VAF, based on the binomial probability distribution. Moreover, our calculator also allows adding assay-specific errors occurring during DNA processing and library preparation, thus calculating with an overall error of a specific NGS assay. The estimation of correct coverage depth is recommended as a starting point when assessing thresholds of NGS assay. Our study also points to the need for guidance regarding the minimum technical requirements, which based on our experience should include the limit of detection (LOD), overall NGS assay error, input, source and quality of DNA, coverage depth, number of variant supporting reads, and total number of target reads covering variant region. Further studies are needed to define the minimum technical requirements and its reporting in diagnostic NGS.Web of Science9art. no. 85

    Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

    Get PDF
    Background Targeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs). Methods To address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection. Results Our tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments. Conclusions AmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve

    Computational modeling for identification of low-frequency single nucleotide variants

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)Reliable detection of low-frequency single nucleotide variants (SNVs) carries great significance in many applications. In cancer genetics, the frequencies of somatic variants from tumor biopsies tend to be low due to contamination with normal tissue and tumor heterogeneity. Circulating tumor DNA monitoring also faces the challenge of detecting low-frequency variants due to the small percentage of tumor DNA in blood. Moreover, in population genetics, although pooled sequencing is cost-effective compared with individual sequencing, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by multiple sources of errors, especially next-generation sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 5%; most fail to consider differential, context-specific sequencing artifacts. To face this challenge, we developed a computational and experimental framework, RareVar, to reliably identify low-frequency SNVs from high-throughput sequencing data. For optimized performance, RareVar utilized a supervised learning framework to model artifacts originated from different components of a specific sequencing pipeline. This is enabled by a customized, comprehensive benchmark data enriched with known low-frequency SNVs from the sequencing pipeline of interest. Genomic-context-specific sequencing error model was trained on the benchmark data to characterize the systematic sequencing artifacts, to derive the position-specific detection limit for sensitive low-frequency SNV detection. Further, a machine-learning algorithm utilized sequencing quality features to refine SNV candidates for higher specificity. RareVar outperformed existing approaches, especially at 0.5% to 5% frequency. We further explored the influence of statistical modeling on position specific error modeling and showed zero-inflated negative binomial as the best-performed statistical distribution. When replicating analyses on an Illumina MiSeq benchmark dataset, our method seamlessly adapted to technologies with different biochemistries. RareVar enables sensitive detection of low-frequency SNVs across different sequencing platforms and will facilitate research and clinical applications such as pooled sequencing, cancer early detection, prognostic assessment, metastatic monitoring, and relapses or acquired resistance identification

    Identification of single nucleotide variants using position-specific error estimation in deep sequencing data

    Get PDF
    BACKGROUND: Targeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs). METHODS: To address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection. RESULTS: Our tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments. CONCLUSIONS: AmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve

    Identification of single nucleotide variants using position-specific error estimation in deep sequencing data.

    Get PDF
    Background Targeted deep sequencing is a highly effective technology to identify known and novel single nucleotide variants (SNVs) with many applications in translational medicine, disease monitoring and cancer profiling. However, identification of SNVs using deep sequencing data is a challenging computational problem as different sequencing artifacts limit the analytical sensitivity of SNV detection, especially at low variant allele frequencies (VAFs).Methods To address the problem of relatively high noise levels in amplicon-based deep sequencing data (e.g. with the Ion AmpliSeq technology) in the context of SNV calling, we have developed a new bioinformatics tool called AmpliSolve. AmpliSolve uses a set of normal samples to model position-specific, strand-specific and nucleotide-specific background artifacts (noise), and deploys a Poisson model-based statistical framework for SNV detection.Results Our tests on both synthetic and real data indicate that AmpliSolve achieves a good trade-off between precision and sensitivity, even at VAF below 5% and as low as 1%. We further validate AmpliSolve by applying it to the detection of SNVs in 96 circulating tumor DNA samples at three clinically relevant genomic positions and compare the results to digital droplet PCR experiments.Conclusions AmpliSolve is a new tool for in-silico estimation of background noise and for detection of low frequency SNVs in targeted deep sequencing data. Although AmpliSolve has been specifically designed for and tested on amplicon-based libraries sequenced with the Ion Torrent platform it can, in principle, be applied to other sequencing platforms as well. AmpliSolve is freely available at https://github.com/dkleftogi/AmpliSolve
    corecore