6 research outputs found

    Precise variant calling in the clinical settings

    Get PDF
    Identifying high quality variants in whole exome sequencing (WES) analysis can be very complex due to the different modifications that can be made in the sample sequencing preparation protocol. This can adversely affect bioinformatics analysis in the identification of variants. The evaluation and correlation of the quality parameters of each analysis stage could help to obtain a better accuracy and precision in the identification of the variants. Furthermore, after identifying high quality variants, the use of reference databases where the clinical significance and frequency of the variants can be consulted, allows for a more accurate diagnosis. During laboratory and bioinformatics analysis, it is possible to calculate many metrics to evaluate the quality of the data being processed. All this data is usually looked at separately and their history is lost over time. Besides, the process of comparing a new workflow to existing ones can be very time-consuming when done manually. In addition, for a significant diagnosis of rare variants, it is important to consider the variant frequency in the sample population. For this reason, a database that incorporates all quality metrics from the entire WES analysis over time and collects population-specific variants for accurate clinical variant identification, is needed. This thesis aims to optimise the evaluation of quality metrics and the classification of variants in the Italian population through the creation of a Structured Query Language (SQL) database directly linked to a website for more intuitive use. The thesis sets out the structure of the database and the configuration of the web page created. Furthermore, during the writing of the thesis, approximately 2,500 exomes were analysed and all quality control parameters derived from both laboratory and bioinformatics analyses were collected. All the data obtained were uploaded to the database in order to verify the usefulness of the application in monitoring data quality trends over time and in identifying possible problems. Two examples of problems identified by the implemented application and subsequently solved by modifications to the laboratory protocol are presented. Moreover, the potential of the database to simplify comparisons between existing and new laboratory protocols storing quality control parameters, is shown. All variants identified in the analysed samples were uploaded to create an accessible reference of genetic variation in Italians. The correct classification of the Italian variants is shown in relation to renowned databases that only report a broader view of the European population. This approach enables researchers to classify variants that are not observed in the most widely used databases (gnomAD Exomes, ExAC, 1KgPhase3). It also allows the identification of rare variants that are generally classified as common and might represent a disease predisposition in the Italian population. In addition, it is possible to recognize common and non-damaging variants in the Italian population that are classified as rare in the European population. In conclusion, the reported results and examples have shown how the new application (extended database with its own website) simplifies and facilitates the identification of problems in clinical WES analysis. It also makes the comparison between the various laboratory protocols easier, allowing for more precision in exome analysis aimed at identifying variants. Finally, the specific investigation of the Italian variants could improve diagnostic accuracy in the specific population

    Exosomes from Plasma of Neuroblastoma Patients Contain Doublestranded DNA Reflecting the Mutational Status of Parental Tumor Cells

    Get PDF
    Neuroblastoma (NB) is an aggressive infancy tumor, leading cause of death among preschool age diseases. Here we focused on characterization of exosomal DNA (exo-DNA) isolated from plasma cell-derived exosomes of neuroblastoma patients, and its potential use for detection of somatic mutations present in the parental tumor cells. Exosomes are small extracellular membrane vesicles secreted by most cells, playing an important role in intercellular communications. Using an enzymatic method, we provided evidence for the presence of double-stranded DNA in the NB exosomes. Moreover, by whole exome sequencing, we demonstrated that NB exo-DNA represents the entire exome and that it carries tumor-specific genetic mutations, including those occurring on known oncogenes and tumor suppressor genes in neuroblastoma (ALK, CHD5, SHANK2, PHOX2B, TERT, FGFR1, and BRAF). NB exo-DNA can be useful to identify variants responsible for acquired resistance, such as mutations of ALK, TP53, and RAS/MAPK genes that appear in relapsed patients. The possibility to isolate and to enrich NB derived exosomes from plasma using surface markers, and the quick and easy extraction of exo-DNA, gives this methodology a translational potential in the clinic. Exo-DNA can be an attractive non-invasive biomarker for NB molecular diagnostic, especially when tissue biopsy cannot be easily available

    STArS (STrain-Amplicon-Seq), a targeted nanopore sequencing workflow for SARS-CoV-2 diagnostics and genotyping

    No full text
    Diagnostic tests based on reverse transcription-quantitative polymerase chain reaction (RT-qPCR) are the gold standard approach to detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection from clinical specimens. However, unless specifically optimized, this method is usually unable to recognize the specific viral strain responsible of coronavirus disease 2019, a crucial information that is proving increasingly important in relation to virus spread and treatment effectiveness. Even if some RT-qPCR commercial assays are currently being developed for the detection of viral strains, they focus only on single/few genetic variants that may not be sufficient to uniquely identify a specific strain. Therefore, genome sequencing approaches remain the most comprehensive solution for virus genotyping and to recognize viral strains, but their application is much less widespread due to higher costs. Starting from the well-established ARTIC protocol coupled to nanopore sequencing, in this work, we developed STArS (STrain-Amplicon-Seq), a cost/time-effective sequencing-based workflow for both SARS-CoV-2 diagnostics and genotyping. A set of 10 amplicons was initially selected from the ARTIC tiling panel, to cover: (i) all the main biologically relevant genetic variants located on the Spike gene; (ii) a minimal set of variants to uniquely identify the currently circulating strains; (iii) genomic sites usually amplified by RT-qPCR method to identify SARS-CoV-2 presence. PCR-amplified clinical samples (both positive and negative for SARS-CoV-2 presence) were pooled together with a serially diluted exogenous amplicon at known concentration and sequenced on a MinION device. Thanks to a scoring rule, STArS had the capability to accurately classify positive samples in agreement with RT-qPCR results, both at the qualitative and quantitative level. Moreover, the method allowed to effectively genotype strain-specific variants and thus also return the phylogenetic classification of SARS-CoV-2-postive samples. Thanks to the reduced turnaround time and costs, the proposed approach represents a step towards simplifying the clinical application of sequencing for viral genotyping, hopefully aiding in combatting the global pandemic

    Shedding light on dark genes: enhanced targeted resequencing by optimizing the combination of enrichment technology and DNA fragment length

    No full text
    The exome contains many obscure regions difficult to explore with current short-read sequencing methods. Repetitious genomic regions prevent the unique alignment of reads, which is essential for the identification of clinically-relevant genetic variants. Long-read technologies attempt to resolve multiple-mapping regions, but they still produce many sequencing errors. Thus, a new approach is required to enlighten the obscure regions of the genome and rescue variants that would be otherwise neglected. This work aims to improve the alignment of multiple-mapping reads through the extension of the standard DNA fragment size. As Illumina can sequence fragments up to 550\u2009bp, we tested different DNA fragment lengths using four major commercial WES platforms and found that longer DNA fragments achieved a higher genotypability. This metric, which indicates base calling calculated by combining depth of coverage with the confidence of read alignment, increased from hundreds to thousands of genes, including several associated with clinical phenotypes. While depth of coverage has been considered crucial for the assessment of WES performance, we demonstrated that genotypability has a greater impact in revealing obscure regions, with ~1% increase in variant calling in respect to shorter DNA fragments. Results confirmed that this approach enlightened many regions previously not explored

    Characterization of FMR1 Repeat Expansion and Intragenic Variants by Indirect Sequence Capture

    Get PDF
    Traditional methods for the analysis of repeat expansions, which underlie genetic disorders, such as fragile X syndrome (FXS), lack single-nucleotide resolution in repeat analysis and the ability to characterize causative variants outside the repeat array. These drawbacks can be overcome by long-read and short-read sequencing, respectively. However, the routine application of next-generation sequencing in the clinic requires target enrichment, and none of the available methods allows parallel analysis of long-DNA fragments using both sequencing technologies. In this study, we investigated the use of indirect sequence capture (Xdrop technology) coupled to Nanopore and Illumina sequencing to characterize FMR1, the gene responsible of FXS. We achieved the efficient enrichment (> 200x) of large target DNA fragments (~60-80 kbp) encompassing the entire FMR1 gene. The analysis of Xdrop-enriched samples by Nanopore long-read sequencing allowed the complete characterization of repeat lengths in samples with normal, pre-mutation, and full mutation status (> 1 kbp), and correctly identified repeat interruptions relevant for disease prognosis and transmission. Single-nucleotide variants (SNVs) and small insertions/deletions (indels) could be detected in the same samples by Illumina short-read sequencing, completing the mutational testing through the identification of pathogenic variants within the FMR1 gene, when no typical CGG repeat expansion is detected. The study successfully demonstrated the parallel analysis of repeat expansions and SNVs/indels in the FMR1 gene at single-nucleotide resolution by combining Xdrop enrichment with two next-generation sequencing approaches. With the appropriate optimization necessary for the clinical settings, the system could facilitate both the study of genotype-phenotype correlation in FXS and enable a more efficient diagnosis and genetic counseling for patients and their relatives

    ACoRE: Accurate SARS-CoV-2 genome reconstruction for the characterization of intra-host and inter-host viral diversity in clinical samples and for the evaluation of re-infections

    No full text
    Sequencing the SARS-CoV-2 genome from clinical samples can be challenging, especially in specimens with low viral titer. Here we report Accurate SARS-CoV-2 genome Reconstruction (ACoRE), an amplicon-based viral genome sequencing workflow for the complete and accurate reconstruction of SARS-CoV-2 sequences from clinical samples, including suboptimal ones that would usually be excluded even if unique and irreplaceable. The protocol was optimized to improve flexibility and the combination of technical replicates was established as the central strategy to achieve accurate analysis of low-titer/suboptimal samples. We demonstrated the utility of the approach by achieving complete genome reconstruction and the identification of false-positive variants in >170 clinical samples, thus avoiding the generation of inaccurate and/or incomplete sequences. Most importantly, ACoRE was crucial to identify the correct viral strain responsible of a relapse case, that would be otherwise mis-classified as a re-infection due to missing or incorrect variant identification by a standard workflow