961 research outputs found
Recommended from our members
Genomic and machine-learning analysis of germline variants in cancer
Cancer often develops from specific DNA alterations, and these cancer-associated mutations influence precision cancer treatment. These alterations can be specific to the tumor DNA (somatic mutations) or they can be heritable and present in normal and tumor DNA (germline mutations). Germline variants can affect how patients respond to therapy and can influence clinical surveillance of patients and their families. While identifying cancer-associated germline variants traditionally required studying families with inherited cancer predispositions, large-scale cancer sequencing cohorts enable alternative analysis of germline variants.
In this dissertation, we develop and apply multiple strategies for analyzing germline DNA from cancer sequencing cohorts. First, we develop the Tumor-Only Boosting Identification framework (TOBI) to learn biological features of true somatic mutations and generate a classification model that identifies DNA variants with somatic characteristics. TOBI has high sensitivity in identifying true somatic variants across several cancer types, particularly in known driver genes. After predicting somatic variants with TOBI, we assess the identified somatic-like germline variants for known oncogenic germline variants and enrichment in biological pathways. We find germline and somatic variants inactivating the Fanconi anemia pathway in 11% of patients with bladder cancer.
Finally, we investigate germline, diagnosis, and relapse variants in a large cohort of patients with pediatric acute lymphoblastic leukemia (ALL). Our somatic analysis captures known ALL driver genes, and we describe the sequential order of diagnosis and relapse mutations, including late events in NT5C2. We apply both the TOBI framework and guidelines American College of Medical Genetics and Genomics to identify potentially cancer-associated germline variants, and nominate nonsynonymous variants in TERT and ATM
Germline loss-of-function variants in the base-excision repair gene MBD4 cause a Mendelian recessive syndrome of adenomatous colorectal polyposis and acute myeloid leukaemia
Inherited defects in base-excision repair (BER) predispose to adenomatous polyposis and colorectal cancer (CRC), yet our understanding of this important DNA repair pathway remains incomplete. By combining detailed clinical, histological and molecular profiling, we reveal biallelic germline loss-of-function (LOF) variants in the BER gene MBD4 to predispose to adenomatous polyposis and -uniquely amongst CRC predisposition syndromes- to myeloid neoplasms. Neoplasms from MBD4-deficient patients almost exclusively accumulate somatic CpG>TpG mutations, resembling mutational signature SBS1. MBD4-deficient adenomas harbour mutations in known CRC driver genes, although AMER1 mutations were more common and KRAS mutations less frequent. We did not find an increased risk for colorectal tumours in individuals with a monoallelic MBD4 LOF variant. We suggest that this condition should be termed MBD4-associated neoplasia syndrome (MANS) and that MBD4 is included in testing for the genetic diagnosis of polyposis and/or early-onset AM
Whole genome sequence analysis of platelet traits in the NHLBI trans-omics for precision medicine initiative
Platelets play a key role in thrombosis and hemostasis. Platelet count (PLT) and mean platelet volume (MPV) are highly heritable quantitative traits, with hundreds of genetic signals previously identified, mostly in European ancestry populations. We here utilize whole genome sequencing from NHLBI\u27s Trans-Omics for Precision Medicine Initiative (TOPMed) in a large multi-ethnic sample to further explore common and rare variation contributing to PLT (n = 61 200) and MPV (n = 23 485). We identified and replicated secondary signals at MPL (rs532784633) and PECAM1 (rs73345162), both more common in African ancestry populations. We also observed rare variation in Mendelian platelet related disorder genes influencing variation in platelet traits in TOPMed cohorts (not enriched for blood disorders). For example, association of GP9 with lower PLT and higher MPV was partly driven by a pathogenic Bernard-Soulier syndrome variant (rs5030764, p.Asn61Ser), and the signals at TUBB1 and CD36 were partly driven by loss of function variants not annotated as pathogenic in ClinVar (rs199948010 and rs571975065). However, residual signal remained for these gene-based signals after adjusting for lead variants, suggesting that additional variants in Mendelian genes with impacts in general population cohorts remain to be identified. Gene-based signals were also identified at several GWAS identified loci for genes not annotated for Mendelian platelet disorders (PTPRH, TET2, CHEK2), with somatic variation driving the result at TET2. These results highlight the value of whole genome sequencing in populations of diverse genetic ancestry to identify novel regulatory and coding signals, even for well-studied traits like platelet traits
Recommended from our members
Computational genomics and genetics of developmental disorders
Computational genomics is at the intersection of computational applied physics, math, statistics, computer science and biology. With the advances in sequencing technology, large amounts of comprehensive genomic data are generated every year. However, the nature of genomic data is messy, complex and unstructured; it becomes extremely challenging to explore, analyze and understand the data based on traditional methods. The needs to develop new quantitative methods to analyze large-scale genomics datasets are urgent. By collecting, processing and organizing clean genomics datasets and using these datasets to extract insights and relevant information, we are able to develop novel methods and strategies to address specific genetics questions using the tools of applied mathematics, statistics, and human genetics.
This thesis describes genetic and bioinformatics studies focused on utilizing and developing state-of-the-art computational methods and strategies in order to identify and interpret de novo mutations that are likely causing developmental disorders. We performed whole exome sequencing as well as whole genome sequencing on congenital diaphragmatic hernia parents-child trios and identified a new candidate risk gene MYRF. Additionally, we found male and female patients carry a different burden of likely-gene- disrupting mutations, and isolated and complex patients carry different gene expression levels in early development of diaphragm tissues for likely-gene-disrupting mutations.
To increase the power to detect risk genes and risk variants, we developed a deep neural network classifier called MVP to accurately predict the pathogenicity of missense variants. MVP implemented an advanced structure of ResNet model and based on two independent data sets, MVP achieved clearly better results in prioritizing pathogenic variants than other methods. Additionally, we studied the genetic connection between developmental disorders and cancer. We found that in developmental disorder patients predicted deleterious de novo mutations are more enriched in cancer driver genes than non cancer driver genes. A Hidden Markov Model was implemented to discover cancer somatic missense mutation hotspots and we demonstrated many cancer driver genes shared a similar mode of action in developmental disorders and caner. By improving ability to interpret missense mutations and leveraging cancer genomics data, we can improve risk gene inference in developmental disorders
mutation3D:Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome
A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrate the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. On a large scale, we show that clustering with mutation3D is able to separate functional from nonfunctional mutations by analyzing a combination of 8,869 known inherited disease mutations and 2,004 SNPs overlaid together upon the same sets of crystal structures and homology models. Further, we present a systematic analysis of whole-genome and whole-exome cancer datasets to demonstrate that mutation3D identifies many known cancer genes as well as previously underexplored target genes. The mutation3D Web interface allows users to analyze their own mutation data in a variety of popular formats and provides seamless access to explore mutation clusters derived from over 975,000 somatic mutations reported by 6,811 cancer sequencing studies. The mutation3D Web interface is freely available with all major browsers supported
Consensus Statement on next-generation-sequencing-based diagnostic testing of hereditary phaeochromocytomas and paragangliomas
Genome Instability and Cance
Known allosteric proteins have central roles in genetic disease
Allostery is a form of protein regulation, where ligands that bind sites
located apart from the active site can modify the activity of the protein. The
molecular mechanisms of allostery have been extensively studied, because
allosteric sites are less conserved than active sites, and drugs targeting them
are more specific than drugs binding the active sites. Here we quantify the
importance of allostery in genetic disease. We show that 1) known allosteric
proteins are central in disease networks, and contribute to genetic disease and
comorbidities much more than non-allosteric proteins, in many major disease
types like hematopoietic diseases, cardiovascular diseases, cancers, diabetes,
or diseases of the central nervous system. 2) variants from cancer genome-wide
association studies are enriched near allosteric proteins, indicating their
importance to polygenic traits; and 3) the importance of allosteric proteins in
disease is due, at least partly, to their central positions in protein-protein
interaction networks, and probably not due to their dynamical properties
The road ahead in genetics and genomics
In celebration of the 20th anniversary of Nature Reviews Genetics, we asked 12 leading researchers to reflect on the key challenges and opportunities faced by the field of genetics and genomics. Keeping their particular research area in mind, they take stock of the current state of play and emphasize the work that remains to be done over the next few years so that, ultimately, the benefits of genetic and genomic research can be felt by everyone
- …