2,809 research outputs found

    Discerning Drivers of Cancer: Computational Approaches to Somatic Exome Sequencing Data

    Get PDF
    Paired tumor-normal sequencing of thousands of patient’s exomes has revealed millions of somatic mutations, but functional characterization and clinical decision making are stymied because biologically neutral ‘passenger’ mutations greatly outnumber pathogenic ‘driver’ mutations. Since most mutations will return negative results if tested, conventional resource-intensive experiments are reserved for mutations which are observed in multiple patients or rarer mutations found in well-established cancer genes. Most mutations are therefore never tested, diminishing the potential to discover new mechanisms of cancer development and treatment opportunities. Computational methods that reliably prioritize mutations for testing would greatly increase the translation of sequencing results to clinical care. The goal of this thesis is to develop new approaches that use datasets of protein-coding somatic mutations to identify putative cancer-causing genes and mutations, and to validate these predictions in silico and experimentally. This effort will be split among several inter-related efforts, which taken together will help experimental biologists and clinicians focus on hypotheses that can yield novel insights into cancer biology, development, and treatment

    Deep mutation modelling in cancer driver mutation and cancer driver gene detection

    Get PDF
    Cancer is a leading cause of death worldwide. Unlike its name would suggest, cancer is not a single disease. It is a group of diseases that arises from the expansion of a somatic cell clone. This expansion is thought to be a result of mutations that confer a selective advantage to the cell clone. These mutations that are advantageous to cells that result in their proliferation and escape of normal cell constraints are called driver mutations. The genes that contain driver mutations are known as driver genes. Studying these mutations and genes is important for understanding how cancer forms and evolves. Various methods have been developed that can discover these mutations and genes. This thesis focuses on a method called Deep Mutation Modelling, a deep learning based approach to predicting the probability of mutations. Deep Mutation Modelling’s output probabilities offer the possibility of creating sample and cancer type specific probability scores for mutations that reflect the pathogenicity of the mutations. Most methods in the past have made scores that are the same for all cancer types. Deep Mutation Modelling offers the opportunity to make a more personalised score. The main objectives of this thesis were to examine the Deep Mutation Modelling output as it was unknown what kind of features it has, see how the output compares against other scoring methods and how the probabilities work in mutation hotspots. Lastly, could the probabilities be used in a common driver gene discovery method. Overall, the goal was to see if Deep Mutation Modelling works and if it is competitive with other known methods. The findings indicate that Deep Mutation Modelling works in predicting driver mutations, but that it does not have sufficient power to do this reliably and requires further improvements

    CODA: Accurate Detection of Functional Associations between Proteins in Eukaryotic Genomes Using Domain Fusion

    Get PDF
    Background: In order to understand how biological systems function it is necessary to determine the interactions and associations between proteins. Gene fusion prediction is one approach to detection of such functional relationships. Its use is however known to be problematic in higher eukaryotic genomes due to the presence of large homologous domain families. Here we introduce CODA (Co-Occurrence of Domains Analysis), a method to predict functional associations based on the gene fusion idiom.Methodology/Principal Findings: We apply a novel scoring scheme which takes account of the genome-specific size of homologous domain families involved in fusion to improve accuracy in predicting functional associations. We show that CODA is able to accurately predict functional similarities in human with comparison to state-of-the-art methods and show that different methods can be complementary. CODA is used to produce evidence that a currently uncharacterised human protein may be involved in pathways related to depression and that another is involved in DNA replication.Conclusions/Significance: The relative performance of different gene fusion methodologies has not previously been explored. We find that they are largely complementary, with different methods being more or less appropriate in different genomes. Our method is the only one currently available for download and can be run on an arbitrary dataset by the user. The CODA software and datasets are freely available from ftp://ftp.biochem.ucl.ac.uk/pub/gene3d_data/v6.1.0/CODA/. Predictions are also available via web services from http://funcnet.eu/

    A More Accessible Drosophila Genome to Study Fly CNS Development: A Dissertation

    Get PDF
    Understanding the complex mechanisms to assemble a functional brain demands sophisticated experimental designs. Drosophila melanogaster, a model organism equipped with powerful genetic tools and evolutionarily conserved developmental programs, is ideal for such mechanistic studies. Valuable insights were learned from research in Drosophila ventral nerve cord, such as spatial patterning, temporal coding, and lineage diversification. However, the blueprint of Drosophila cerebrum development remains largely unknown. Neural progenitor cells, called neuroblasts (NBs), serially and stereotypically produce neurons and glia in the Drosophila cerebrum. Neuroblasts inherit specific sets of early patterning genes, which likely determine their individual identities when neuroblasts delaminate from neuroectoderm. Unique neuroblasts may hence acquire the abilities to differentially interpret the temporal codes and deposit characteristic progeny lineages. We believe resolving this age-old speculation requires a tracing system that links patterning genes to neuroblasts and corresponding lineages, and further allows specific manipulations. Using modern transgenic systems, one can immortalize transient NB gene expressions into continual labeling of their offspring. Having a collection of knockin drivers that capture endogenous gene expression patterns would open the door for tracing specific NBs and their progenies based on the combinatorial expression of various early patterning genes. Anticipating the need for a high throughput gene targeting system, we created Golic+ (gene targeting during oogenesis with lethality inhibitor and CRISPR/Cas “plus”), which features efficient homologous recombination in cystoblasts and a lethality selection for easy targeting candidate recovery. Using Golic+, we successfully generated T2AGal4 knock-ins for 6 representative early patterning genes, including lab, unpg, hkb, vnd, ind, and msh. They faithfully recapitulated the expression patterns of the targeted genes. After preserving initial NB expressions by triggering irreversible genetic labeling, we revealed the lineages founded by the NBs expressing a particular early patterning gene. Identifying the neuroblasts and lineages that express a particular early patterning gene should elucidate the genetic origin of neuroblast diversity. We believe such an effort will lead to a deeper understanding of brain development and evolution

    Network Approaches to the Study of Genomic Variation in Cancer

    Get PDF
    Advances in genomic sequencing technologies opened the door for a wider study of cancer etiology. By analyzing datasets with thousands of exomes (or genomes), researchers gained a better understanding of the genomic alterations that confer a selective advantage towards cancerous growth. A predominant narrative in the field has been based on a dichotomy of alterations that confer a strong selective advantage, called cancer drivers, and the bulk of other alterations assumed to have a neutral effect, called passengers. Yet, a series of studies questioned this narrative and assigned potential roles to passengers, be it in terms of facilitating tumorigenesis or countering the effect of drivers. Consequently, the passenger mutational landscape received a higher level of attention in attempt to prioritize the possible effects of its alterations and to identify new therapeutic targets. In this dissertation, we introduce interpretable network approaches to the study of genomic variation in cancer. We rely on two types of networks, namely functional biological networks and artificial neural nets. In the first chapter, we describe a propagation method that prioritizes 230 infrequently mutated genes with respect to their potential contribution to cancer development. In the second chapter, we further transcend the driver-passenger dichotomy and demonstrate a gradient of cancer relevance across human genes. In the last two chapters, we present methods that simplify neural network models to render them more interpretable with a focus on functional genomic applications in cancer and beyond

    Knowledge Driven Approaches and Machine Learning Improve the Identification of Clinically Relevant Somatic Mutations in Cancer Genomics

    Get PDF
    For cancer genomics to fully expand its utility from research discovery to clinical adoption, somatic variant detection pipelines must be optimized and standardized to ensure identification of clinically relevant mutations and to reduce laborious and error-prone post-processing steps. To address the need for improved catalogues of clinically and biologically important somatic mutations, we developed DoCM, a Database of Curated Mutations in Cancer (http://docm.info), as described in Chapter 2. DoCM is an open source, openly licensed resource to enable the cancer research community to aggregate, store and track biologically and clinically important cancer variants. DoCM is currently comprised of 1,364 variants in 132 genes across 122 cancer subtypes, based on the curation of 876 publications. To demonstrate the utility of this resource, the mutations in DoCM were used to identify variants of established significance in cancer that were missed by standard variant discovery pipelines (Chapter 3). Sequencing data from 1,833 cases across four TCGA projects were reanalyzed and 1,228 putative variants that were missed in the original TCGA reports were identified. Validation sequencing data were produced from 93 of these cases to confirm the putative variant we detected with DoCM. Here, we demonstrated that at least one functionally important variant in DoCM was recovered in 41% of cases studied. A major bottleneck in the DoCM analysis in Chapter 3 was the filtering and manual review of somatic variants. Several steps in this post-processing phase of somatic variant calling have already been automated. However, false positive filtering and manual review of variant candidates remains as a major challenge, especially in high-throughput discovery projects or in clinical cancer diagnostics. In Chapter 4, an approach that systematized and standardized the post-processing of somatic variant calls using machine learning algorithms, trained on 41,000 manually reviewed variants from 20 cancer genome projects, is outlined. The approach accurately reproduced the manual review process on hold out test samples, and accurately predicted which variants would be confirmed by orthogonal validation sequencing data. When compared to traditional manual review, this approach increased identification of clinically actionable variants by 6.2%. These chapters outline studies that result in substantial improvements in the identification and interpretation of somatic variants, the use of which can standardize and streamline cancer genomics, enabling its use at high throughput as well as clinically

    Differential exon usage of developmental genes is associated with deregulated epigenetic marks

    Get PDF
    Alternative exon usage is known to afect a large portion of genes in mammalian genomes. Importantly, diferent splice isoforms sometimes possess distinctly diferent protein functions. Here, we analyzed data from the Human Epigenome Atlas for 11 diferent human adult tissues and for 8 cultured cells that mimic early developmental stages. We found a signifcant enrichment of cases where diferential usage of exons in various developmental stages of human cells and tissues is associated with diferential epigenetic modifcations in the fanking regions of individual exons. Many of the genes that were diferentially regulated at the exon level and showed deregulated histone marks at the respective exon fanks are functionally associated with development and metabolism

    Polymorphisms in folate-metabolizing genes, chromosome damage, and risk of Down syndrome in Italian women: identification of key factors using artificial neural networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Studies in mothers of Down syndrome individuals (MDS) point to a role for polymorphisms in folate metabolic genes in increasing chromosome damage and maternal risk for a Down syndrome (DS) pregnancy, suggesting complex gene-gene interactions. This study aimed to analyze a dataset of genetic and cytogenetic data in an Italian group of MDS and mothers of healthy children (control mothers) to assess the predictive capacity of artificial neural networks assembled in TWIST system in distinguish consistently these two different conditions and to identify the variables expressing the maximal amount of relevant information to the condition of being mother of a DS child.</p> <p>The dataset consisted of the following variables: the frequency of chromosome damage in peripheral lymphocytes (BNMN frequency) and the genotype for 7 common polymorphisms in folate metabolic genes (<it>MTHFR </it>677C>T and 1298A>C, <it>MTRR </it>66A>G, <it>MTR </it>2756A>G, <it>RFC1 </it>80G>A and <it>TYMS </it>28bp repeats and 1494 6bp deletion). Data were analysed using TWIST system in combination with supervised artificial neural networks, and a semantic connectivity map.</p> <p>Results</p> <p>TWIST system selected 6 variables (BNMN frequency, <it>MTHFR </it>677TT, <it>RFC1 </it>80AA, <it>TYMS </it>1494 6bp +/+, <it>TYMS </it>28bp 3R/3R and <it>MTR </it>2756AA genotypes) that were subsequently used to discriminate between MDS and control mothers with 90% accuracy. The semantic connectivity map provided important information on the complex biological connections between the studied variables and the two conditions (being MDS or control mother).</p> <p>Conclusions</p> <p>Overall, the study suggests a link between polymorphisms in folate metabolic genes and DS risk in Italian women.</p
    • …
    corecore