89 research outputs found

    Characterisation of chromatin modifiers in endometrial cancer

    Get PDF
    Chromatin organization is a critical regulator of gene expression and cell phenotype, and is frequently dysregulated in cancer. Endometrial cancer (EC) is the most common gynecological malignancy, and casues significant morbidity and mortality. EC is notable for recurrent alterations in chromatin including the ARID1A gene – a key component of the SWI/SNF remodeling complex – has emerged as a prevalent driver in EC, along with other remodelers such as CHD4 and BCOR. However, a systematic analysis of chromatin modifier alterations and their functional consequences in EC has not been done. This thesis presents a comprehensive investigation of genomic alterations in chromatin modifiers using whole genome sequencing (WGS) data from the Genomics England(GEL) 100,000 Genomes Project, the largest EC cohort to date. I demonstrate that while mutational processes vary across molecular subtypes, numerous chromatin modifiers are consistently altered across all subtypes. These genomic alterations frequently occur in different subunits of the same complex, such as alterations in CHD3, CHD4 and MBD3, subunits of the ATP-dependent chromatin remodeling complex NuRD. Additionally, I examine the correlation between driver mutations and patient survival, revealing that mutations in PBRM1 and CHD4 are associated with an increased risk of death after accounting for age, molecular subtype, and tumor mutation and copy number alteration burden. To complement the correlative analysis, I employ CRISPR-Cas9 gene editing to study the functional consequences of perturbations in selected chromatin modifiers (ARID1A, ARID1B, ARID5B, EP300, KMT2C, and SETD1B) in normal and malignant endometrial cells using transcriptomic and chromatin accessibility data. Furthermore, I explore the implications of the N1459S BCOR mutation, a hotspot mutation near-unique to EC. Considering the frequent occurrence of ARID1A mutations in malignant tissues and their absence in normal endometrium, I investigate tumor heterogeneity in endometrial cancer. I discuss the limitations of current methodologies and propose a deep learning approach to uncover the hidden evolutionary trajectories. With additional research, this approach could potentially facilitate understanding the sequence in which driver alterations occur. In summary, this work presents a resource for investigating chromatin organization in EC. The functional analyses using gene editing techniques confirm that EC-associated drivers disrupt essential cellular processes involved in oncogenesis. By providing the first systematic correlative and functional analyses of chromatin modifiers in EC, this thesis offers novel insights into EC biology

    Systems Analytics and Integration of Big Omics Data

    Get PDF
    A “genotype"" is essentially an organism's full hereditary information which is obtained from its parents. A ""phenotype"" is an organism's actual observed physical and behavioral properties. These may include traits such as morphology, size, height, eye color, metabolism, etc. One of the pressing challenges in computational and systems biology is genotype-to-phenotype prediction. This is challenging given the amount of data generated by modern Omics technologies. This “Big Data” is so large and complex that traditional data processing applications are not up to the task. Challenges arise in collection, analysis, mining, sharing, transfer, visualization, archiving, and integration of these data. In this Special Issue, there is a focus on the systems-level analysis of Omics data, recent developments in gene ontology annotation, and advances in biological pathways and network biology. The integration of Omics data with clinical and biomedical data using machine learning is explored. This Special Issue covers new methodologies in the context of gene–environment interactions, tissue-specific gene expression, and how external factors or host genetics impact the microbiome

    NOVEL COMPUTATIONAL METHODS FOR CANCER GENOMICS DATA ANALYSIS

    Get PDF
    Cancer is a genetic disease responsible for one in eight deaths worldwide. The advancement of next-generation sequencing (NGS) technology has revolutionized the cancer research, allowing comprehensively profiling the cancer genome at great resolution. Large-scale cancer genomics research has sparked the needs for efficient and accurate Bioinformatics methods to analyze the data. The research presented in this dissertation focuses on three areas in cancer genomics: cancer somatic mutation detection; cancer driver genes identification and transcriptome profiling on single-cell level. NGS data analysis involves a series of complicated data transformation that convert raw sequencing data to the information that is interpretable by cancer researchers. The first project in the dissertation established a robust, reproducible and scalable cancer genomics data analysis workflow management system that automates the best practice mutation calling pipelines to detect somatic single nucleotide polymorphisms, insertion, deletion and copy number variation from NGS data. It integrates mutation annotation, clinically actionable therapy prediction and data visualization that streamlines the sequence-to-report data transformation. In order to differentiate the driver mutations buried among a vast pool of passenger mutations from a somatic mutation calling project, we developed MEScan in the second project, a novel method that enables genome-scale driver mutations identification based on mutual exclusivity test using cancer somatic mutation data. MEScan implements an efficient statistical framework to de novo screen mutual exclusive patterns and in the meantime taking into account the patient-specific and gene-specific background mutation rate and adjusting the heterogenous mutation frequency. It outperforms several existing methods based on simulation studies and real-world datasets. Genome-wide screening using existing TCGA somatic mutation data discovers novel cancer-specific and pan-cancer mutually exclusive patterns. Bulk RNA sequencing (RNA-Seq) has become one of the most commonly used techniques for transcriptome profiling in a wide spectrum of biomedical and biological research. Analyzing bulk RNA-Seq reads to quantify expression at each gene locus is the first step towards the identification of differentially expressed genes for downstream biological interpretation. Recent advances in single-cell RNA-seq (scRNA-seq) technology allows cancer biologists to profile gene expression on higher resolution cellular level. Preprocessing scRNA-seq data to quantify UMI-based gene count is the key to characterize intra-tumor cellular heterogeneity and identify rare cells that governs tumor progression, metastasis and treatment resistance. Despite its popularity, summarizing gene count from raw sequencing reads remains the one of the most time-consuming steps with existing tools. Current pipelines do not balance the efficiency and accuracy in large-scale gene count summarization in both bulk and scRNA-seq experiments. In the third project, we developed a light-weight k-mer based gene counting algorithm, FastCount, to accurately and efficiently quantify gene-level abundance using bulk RNA-seq or UMI-based scRNA-seq data. It achieves at least an order-of-magnitude speed improvement over the current gold standard pipelines while providing competitive accuracy

    METABOLIC MODELING AND OMICS-INTEGRATIVE ANALYSIS OF SINGLE AND MULTI-ORGANISM SYSTEMS: DISCOVERY AND REDESIGN

    Get PDF
    Computations and modeling have emerged as indispensable tools that drive the process of understanding, discovery, and redesign of biological systems. With the accelerating pace of genome sequencing and annotation information generation, the development of computational pipelines for the rapid reconstruction of high-quality genome-scale metabolic networks has received significant attention. These models provide a rich tapestry for computational tools to quantitatively assess the metabolic phenotypes for various systems-level studies and to develop engineering interventions at the DNA, RNA, or enzymatic level by careful tuning in the biophysical modeling frameworks. in silico genome-scale metabolic modeling algorithms based on the concept of optimization, along with the incorporation of multi-level omics information, provides a diverse array of toolboxes for new discovery in the metabolism of living organisms (which includes single-cell microbes, plants, animals, and microbial ecosystems) and allows for the reprogramming of metabolism for desired output(s). Throughout my doctoral research, I used genome-scale metabolic models and omics-integrative analysis tools to study how microbes, plants, animal, and microbial ecosystems respond or adapt to diverse environmental cues, and how to leverage the knowledge gleaned from that to answer important biological questions. Each chapter in this dissertation will provide a detailed description of the methodology, results, and conclusions from one specific research project. The research works presented in this dissertation represent important foundational advance in Systems Biology and are crucial for sustainable development in food, pharmaceuticals and bioproduction of the future. Advisor: Rajib Sah

    Bayesian meta-analysis models for heterogeneous genomics data

    Get PDF
    <p>The accumulation of high-throughput data from vast sources has drawn a lot attentions to develop methods for extracting meaningful information out of the massive data. More interesting questions arise from how to combine the disparate information, which goes beyond modeling sparsity and dimension reduction. This dissertation focuses on the innovations in the area of heterogeneous data integration.</p><p>Chapter 1 contextualizes this dissertation by introducing different aspects of meta-analysis and model frameworks for high-dimensional genomic data.</p><p>Chapter 2 introduces a novel technique, joint Bayesian sparse factor analysis model, to vertically integrate multi-dimensional genomic data from different platforms. </p><p>Chapter 3 extends the above model to a nonparametric Bayes formula. It directly infers number of factors from a model-based approach.</p><p>On the other hand, chapter 4 deals with horizontal integration of diverse gene expression data; the model infers pathway activities across various experimental conditions. </p><p>All the methods mentioned above are demonstrated in both simulation studies and real data applications in chapters 2-4.</p><p>Finally, chapter 5 summarizes the dissertation and discusses future directions.</p>Dissertatio

    GENETIC EVOLUTION AND PROGNOSTIC DETERMINANTS OF PANCREATIC CANCER ON LONGITUDINAL LIQUID BIOPSIES

    Get PDF
    Pancreatic ductal adenocarcinoma (PDAC) has one of the lowest 5-year survival rates amongst solid tumors. As early detection of PDAC is unusual and typically incidental, most patients present with locally advanced and metastatic disease where effective therapeutic strategies remain a significant unmet need. Specifically, surrogate biomarkers for tumor monitoring of PDAC may lead to improved elucidation of clinical actionability and prognostic potential. On the other hand, tumor tissue is rarely sampled in patients presenting with de novo or recurrent metastatic PDAC, apart from a fine needle aspiration or a core needle biopsy performed for diagnosis. This precludes the opportunity for elucidating molecular underpinnings of cancer recurrence, chemoresistance, and therapeutic decision-making in advanced disease patients over the course of their therapy. For this reason, we aim to use so called “liquid biopsies” in the form of circulating nucleic acids and exosomes as a strategy that is amenable to longitudinal, relatively non-invasive sampling. Circulating tumor DNA and circulating exosomes contain genetic cargo representative of the neoplastic cells from which they are released and can serve as a reliable surrogate of the patient’s tumor DNA, enabling a new way of interrogating cancers. We demonstrate that serial quantitative measurements of these tumor nucleic acid sources in circulation can provide clinically relevant predictive and prognostic information in pancreatic cancer patients, including anticipation of impending disease progression and putative mechanisms of resistance to ongoing therapy. We also describe our ability to specifically capture tumor material in circulation following a comprehensive characterization of the pancreatic cancer exosomal “surfaceome”. By leveraging an immune-capture approach paired with ultrasensitive molecular barcoding techniques, we are able to increase our sensitivity of detection of rare molecules in circulation that are derived from the tumor. Ultimately, this has implications for stratification of patients into therapeutic “buckets” through a personalized approach that may lead to greater survival benefits in PDAC

    COMPUTATIONAL GENOMIC MODELS FOR SPATIO-TEMPORAL INVESTIGATION OF EARLY LUNG CANCER PATHOLOGY

    Get PDF
    Lung cancer, of which non-small cell lung cancer (NSCLC) is the most common form, is the second most prevalent cancer and the leading cause of cancer-related deaths. NSCLCs primarily comprise adenocarcinomas (LUAD) and squamous cell carcinomas (LUSC). Advances in early detection and prevention have been limited by the lack of early-stage biomarkers and targets. A comprehensive molecular characterization of premalignant lesions and tumor-adjacent normal tissue can aid in better understanding NSCLC pathogenesis. However, these investigations are further challenged by limited tissue availability and low cellular fractions of detectable somatic mutations. Therefore, there is a dearth of knowledge about the pathogenesis of premalignant lung lesions, especially for atypical adenomatous hyperplasia (AAH), the only known precursor to LUADs. We performed a cross-platform integrative analysis comprising targeted DNA sequencing, genotype array profiling and transcriptome sequencing of matched AAHs, LUADs and normal tissues from 23 early-stage patients. The study revealed potentially divergent pathways based on the mutation status of AAH (BRAF vs KRAS), recurrent chromosomal aberrations (17p loss) and the presence of immune deregulation early in the pathogenesis of AAHs. Molecular changes, characteristic of NSCLCs, might also occur in normal tissues, preceding identifiable premalignancy-associated morphological changes. We sought to comprehensively survey the somatic mutational architecture of the normal airway in early-stage NSCLCs. Targeted DNA sequencing allowed us to capture driver mutations at low cellular fractions, typical of these non-malignant tissues. Additionally, genotype array profiling helped characterize subtle chromosomal aberrations in these tissues. This multi-region study included tumor-adjacent and -distant airways, nasal epithelia and uninvolved normal lung (collectively cancerized field) along with matched multi-region NSCLCs and blood cells from 48 patients. Integrative computational analysis revealed genomic airway field carcinogenesis in 52% of cases. The airway field exhibited mutations in known drivers, that were present at lower frequencies compared to NSCLCs, suggestive of selection-driven clonal expansion. These driver events also comprised somatic “two-hit” alterations in matched airway field and NSCLCs. Our study design offers spatiotemporal insights into NSCLC development and suggests potential targets for early detection and treatment, in possibly less hostile environments of premalignancy. To validate and enhance the utility of the bioinformatic techniques devised and implemented for these investigations, I also provide methods to expand such analyses across multiple tumor sites
    • …
    corecore