Genomic and machine-learning analysis of germline variants in cancer

Abstract

Cancer often develops from specific DNA alterations, and these cancer-associated mutations influence precision cancer treatment. These alterations can be specific to the tumor DNA (somatic mutations) or they can be heritable and present in normal and tumor DNA (germline mutations). Germline variants can affect how patients respond to therapy and can influence clinical surveillance of patients and their families. While identifying cancer-associated germline variants traditionally required studying families with inherited cancer predispositions, large-scale cancer sequencing cohorts enable alternative analysis of germline variants. In this dissertation, we develop and apply multiple strategies for analyzing germline DNA from cancer sequencing cohorts. First, we develop the Tumor-Only Boosting Identification framework (TOBI) to learn biological features of true somatic mutations and generate a classification model that identifies DNA variants with somatic characteristics. TOBI has high sensitivity in identifying true somatic variants across several cancer types, particularly in known driver genes. After predicting somatic variants with TOBI, we assess the identified somatic-like germline variants for known oncogenic germline variants and enrichment in biological pathways. We find germline and somatic variants inactivating the Fanconi anemia pathway in 11% of patients with bladder cancer. Finally, we investigate germline, diagnosis, and relapse variants in a large cohort of patients with pediatric acute lymphoblastic leukemia (ALL). Our somatic analysis captures known ALL driver genes, and we describe the sequential order of diagnosis and relapse mutations, including late events in NT5C2. We apply both the TOBI framework and guidelines American College of Medical Genetics and Genomics to identify potentially cancer-associated germline variants, and nominate nonsynonymous variants in TERT and ATM

    Similar works