2 research outputs found

    FAVR (Filtering and Annotation of Variants that are Rare): methods to facilitate the analysis of rare germline genetic variants from massively parallel sequencing datasets

    Get PDF
    BACKGROUND: Characterising genetic diversity through the analysis of massively parallel sequencing (MPS) data offers enormous potential to significantly improve our understanding of the genetic basis for observed phenotypes, including predisposition to and progression of complex human disease. Great challenges remain in resolving genetic variants that are genuine from the millions of artefactual signals. RESULTS: FAVR is a suite of new methods designed to work with commonly used MPS analysis pipelines to assist in the resolution of some of the issues related to the analysis of the vast amount of resulting data, with a focus on relatively rare genetic variants. To the best of our knowledge, no equivalent method has previously been described. The most important and novel aspect of FAVR is the use of signatures in comparator sequence alignment files during variant filtering, and annotation of variants potentially shared between individuals. The FAVR methods use these signatures to facilitate filtering of (i) platform and/or mapping-specific artefacts, (ii) common genetic variants, and, where relevant, (iii) artefacts derived from imbalanced paired-end sequencing, as well as annotation of genetic variants based on evidence of co-occurrence in individuals. We applied conventional variant calling applied to whole-exome sequencing datasets, produced using both SOLiD and TruSeq chemistries, with or without downstream processing by FAVR methods. We demonstrate a 3-fold smaller rare single nucleotide variant shortlist with no detected reduction in sensitivity. This analysis included Sanger sequencing of rare variant signals not evident in dbSNP131, assessment of known variant signal preservation, and comparison of observed and expected rare variant numbers across a range of first cousin pairs. The principles described herein were applied in our recent publication identifying XRCC2 as a new breast cancer risk gene and have been made publically available as a suite of software tools. CONCLUSIONS: FAVR is a platform-agnostic suite of methods that significantly enhances the analysis of large volumes of sequencing data for the study of rare genetic variants and their influence on phenotypes

    Next generation sequencing to find genetic risk factors in familial cancer

    Get PDF
    In 2015, Cancer is the second leading cause of death worldwide. Genetic predisposition in familial cancer cases is largely unexplained. At the same time, rapid development in sequencing technology results in an unprecedented increase in the amount of whole exome- and whole genome sequencing data. The studies in this thesis take advantage of the technology and explore possibilities to identify genetic factors behind cancer development. In paper I, we identified 12 novel non-synonymous single nucleotide variants, which were shared among 5 affected members of a family with gastric- and rectal cancer. The mutations were found in 12 different genes; DZIP1L, PCOLCE2, IGSF10, SUCNR1, OR13C8, EPB41L4B, SEC16A, NOTCH1, TAS2R7, SF3A1, GAL3ST1, and TRIOBP. None of the mutations was suggested as a high penetrant mutation We propose this family, suggested to segregate dominant disease, could be an example of complex inheritance. In paper II, we identified a pathogenic variant in PTEN in a patient with a Cowden syndrome. We confirmed a pathogenic variant in PMS2 found in one of the samples suggested by another study. In addition, the study proposed 3 candidate missense variant in known cancer susceptibility genes (BMPR1A, BRIP1 and SRC), 3 truncating variants in possibly novel cancer genes (CLSPN, SEC24B and SSH2), 4 candidate missense variants (ACACA, NR2C2, INPP4A and DIDO1), and 5 possible autosomal recessive genes (ATP10B, PKHD1, UGGT2, MYH13 and TFF3). The study in paper III was to provide a comprehensive local reference database of 1,000 whole genome sequenced Swedish individuals. The samples were selected by principal component analysis from the Swedish Twin Registry (n=942) and The Northern Sweden Population Health Study (n=58). The result illustrated that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, confirming the importance this database. The aim of paper IV was to identify combinations of both known and unknown cancer processes in humans based on the integration of base substitution-, copy number variation-, structural rearrangement- and microsatellite instability profile in 74 whole genome sequencing tumor-normal pairs from The Cancer Genome Atlas project (TCGA). The results illustrated correlated mutational structure both between and within mutation types, suggesting integrating profiles of several mutation types can enhance accuracy in mutational patterns discovery. In conclusion, advancement in sequencing- and computational technology demonstrated its capability in identifying cancer causative mutations, proposing candidate genes, providing infrastructure for medical research, as well as visualizing processes underlying cancer development
    corecore