33 research outputs found

    Interpreting microarray experiments via co-expressed gene groups analysis

    Get PDF
    International audienceMicroarray technology produces vast amounts of data by measuring simultaneously the expression levels of thousands of genes under hundreds of biological conditions. Nowadays, one of the principal challenges in bioinformatics is the interpretation of huge data using different sources of information. We propose a novel data analysis method named CGGA (Co-expressed Gene Groups Analysis) that automatically finds groups of genes that are functionally enriched, i.e. have the same functional annotations, and are co- expressed. CGGA automatically integrates the information of microarrays, i.e. gene expression profiles, with the functional annotations of the genes obtained by the genome-wide information sources such as Gene Ontology (GO)1. By applying CGGA to well-known microarray experiments, we have identified the principal functionally enriched and co-expressed gene groups, and we have shown that this approach enhances and accelerates the interpretation of DNA microarray experiments

    A B7-CD28 Family-Based Signature Demonstrates Significantly Different Prognosis and Immunological Characteristics in Diffuse Gliomas

    Get PDF
    The B7-CD28 gene family plays a crucial role in modulating immune functions and has served as potential targets for immunotherapeutic strategies. Therefore, we systematically analyzed B7-CD28 family gene expression profiles and constructed a B7-CD28 family-based prognostic signature to predict survival and immune host status in diffuse gliomas. The TCGA dataset was used as a training cohort, and three CGGA datasets (mRNAseq_325, mRNAseq_693 and mRNA-array) were employed as validation cohorts to intensify the findings that we have revealed in TCGA dataset. Ultimately, we developed a B7-CD28 family-based signature that consisted of CD276, CD274, PDCD1LG2 and CD80 using LASSO Cox analysis. This gene signature was validated to have significant prognostic value, and could be used as a biomarker to distinguish pathological grade and IDH mutation status in diffuse glioma. Additionally, we found that the gene signature was significantly related to intensity of immune response and immune cell population, as well as several other important immune checkpoint genes, holding a great potential to be a predictive immune marker for immunotherapy and tumor microenvironment. Finally, a B7-CD28 family-based nomogram was established to predict patient life expectancy contributing to facilitate personalizing therapy for tumor sufferers. In summary, this is the first mathematical model based on this gene family with the aim of providing novel insights into immunotherapy for diffuse glioma

    ZNF521 Enhances MLL-AF9-Dependent Hematopoietic Stem Cell Transformation in Acute Myeloid Leukemias by Altering the Gene Expression Landscape

    Get PDF
    Leukemias derived from the MLL-AF9 rearrangement rely on dysfunctional transcriptional networks. ZNF521, a transcription co-factor implicated in the control of hematopoiesis, has been proposed to sustain leukemic transformation in collaboration with other oncogenes. Here, we demonstrate that ZNF521 mRNA levels correlate with specific genetic aberrations: in particular, the highest expression is observed in AMLs bearing MLL rearrangements, while the lowest is detected in AMLs with FLT3-ITD, NPM1, or CEBPα double mutations. In cord blood-derived CD34(+) cells, enforced expression of ZNF521 provides a significant proliferative advantage and enhances MLL-AF9 effects on the induction of proliferation and the expansion of leukemic progenitor cells. Transcriptome analysis of primary CD34(+) cultures displayed subsets of genes up-regulated by MLL-AF9 or ZNF521 single transgene overexpression as well as in MLL-AF9/ZNF521 combinations, at either the early or late time points of an in vitro leukemogenesis model. The silencing of ZNF521 in the MLL-AF9 + THP-1 cell line coherently results in an impairment of growth and clonogenicity, recapitulating the effects observed in primary cells. Taken together, these results underscore a role for ZNF521 in sustaining the self-renewal of the immature AML compartment, most likely through the perturbation of the gene expression landscape, which ultimately favors the expansion of MLL-AF9-transformed leukemic clones

    A Robust Unified Graph Model Based on Molecular Data Binning for Subtype Discovery in High-dimensional Spaces

    Get PDF
    Machine learning (ML) is a subfield of artificial intelligence (AI) that has already revolutionised the world around us. It is a widely employed process for discovering patterns and groups within datasets. It has a wide range of applications including disease subtyping, which aims to discover intrinsic subtypes of disease in large-scale unlabelled data. Whilst the groups discovered in multi-view high-dimensional data by ML algorithms are promising, their capacity to identify pertinent and meaningful groups is limited by the presence of data variability and outliers. Since outlier values represent potential but unlikely outcomes, they are statistically and philosophically fascinating. Therefore, the primary aim of this thesis was to propose a robust approach that discovers meaningful groups while considering the presence of data variability and outliers in the data. To achieve this aim, a novel robust approach (ROMDEX) was developed that utilised the proposed intermediate graph models (IMGs) for robust computation of proximity between observations in the data. Finally, a robust multi-view graph-based clustering approach was developed based on ROMDEX that improved the discovery of meaningful groups that were hidden behind the noise in the data. The proposed approach was validated on real-world, and synthetic data for disease subtyping. Additionally, the stability of the approach was assessed by evaluating its performance across different levels of noise in clustering data. The results were evaluated through Kaplan-Meier survival time analysis for disease subtyping. Also, the concordance index (CI) and normalised mutual information (NMI) are used to evaluate the predictive ability of the proposed clustering model. Additionally, the accuracy, Kappa statistic and rand index are computed to evaluate the clustering stability against various levels of Gaussian noise. The proposed approach outperformed the existing state-of-the-art approaches MRGC, PINS, SNF, Consensus Clustering, and Icluster+ on these datasets. The findings for all datasets were outstanding, demonstrating the predictive ability of the proposed unsupervised graph-based clustering approach

    Nonlinear Dependence in the Discovery of Differentially Expressed Genes

    Get PDF

    Aggressive PDACs show hypomethylation of repetitive elements and the execution of an intrinsic IFN program linked to a ductal cell of origin

    Get PDF
    Pancreatic ductal adenocarcinoma (PDAC) is characterized by extensive desmoplasia, which challenges the molecular analyses of bulk tumor samples. Here we FACS-purified epithelial cells from human PDAC and normal pancreas and derived their genome-wide transcriptome and DNA methylome landscapes. Clustering based on DNA methylation revealed two distinct PDAC groups displaying different methylation patterns at regions encoding repeat elements. Methylation(low) tumors are characterized by higher expression of endogenous retroviral (ERV) transcripts and dsRNA sensors which leads to a cell intrinsic activation of an interferon signature (IFNsign). This results in a pro-tumorigenic microenvironment and poor patient outcome. Methylation(low)/IFNsign(high) and Methylation(high)/IFNsign(low) PDAC cells preserve lineage traits, respective of normal ductal or acinar pancreatic cells. Moreover, ductal-derived Kras(G12D)/Trp53(−/−) mouse PDACs show higher expression of IFNsign compared to acinar-derived counterparts. Collectively, our data point to two different origins and etiologies of human PDACs, with the aggressive Methylation(low)/IFNsign(high) subtype potentially targetable by agents blocking intrinsic IFN-signaling

    Investigating mechanisms and indicators of sensitivity to replication stress-targeting therapies in glioblastoma

    Get PDF
    Introduction Evidence suggests a subpopulation of treatment resistant glioblastoma (GBM) cancer stem cells (GSCs) is responsible for tumour recurrence, an almost universally deadly characteristic of this cancer of extreme unmet need. Current treatments fail to eradicate GSCs and novel GSC targeting therapies are a clinical priority. Elevated DNA replication stress (RS) in GSCs has been described, leading to constitutive DNA damage response activation and treatment resistance and targeting RS with combined ATR and PARP inhibition (CAiPi) has provided potent GSC cytotoxicity. Nevertheless, there are a relative lack of studies investigating the underlying mechanisms of response to CAiPi in GBM and a lack of robust transcriptional signatures or genomic biomarkers correlated with CAiPi response in GSCs. Aims This thesis aims to investigate RS as a targetable vulnerability of GSCs. It aims to achieve this by studying the mechanisms of sensitivity to inhibition of the RS response to inform transcriptional indicators of sensitivity. Lastly, it aims to investigate the feasibility of this therapeutic strategy in a preclinical model. Methods Paired GSC-enriched and GSC-depleted, differentiated (‘bulk’) populations, derived from resected GBM specimens, were maintained in serum-free, stemenriching conditions or differentiating conditions respectively. WGS and RNAseq were utilised to characterise the genomic and transcriptomic landscape of the cell line panel. Responses to CAiPi were assessed by clonogenic and cell viability assays and validated in a CD133 sorted population by neurosphere assay. Replication dynamics in paired GSC and bulk cells were investigated by a DNA fibre assay. Dysregulated S phase was analysed by quantification of 53BP1 nuclear bodies (53BP1NB), indicative of under-replication of the genome, and quantification of re-replicating cells by flow cytometry. Chromosomal instability was interrogated by quantification of chromatin bridges and micronuclei. Novel mechanistic discoveries prevalent in GSCs with potent CAiPi-sensitivity were used to curate a transcriptional marker of sensitivity for interrogation in GBM cell lines and in published clinical datasets. Lastly the feasibility of CAiPi was investigated in an in vivo preclinical model, assessing tolerability and tumour penetration. Results CAiPi was potently cytotoxic to a population of GSCs but highly heterogenous responses to CAiPi were observed across a panel of seven paired GSCs and bulk cells. Sensitivity was not predicted by elevated RS in GSCs or any previously defined biomarkers of RS or CAiPi sensitivity. Differential sensitivity was exploited for further investigations which identified transcriptional dysregulation of DNA replication, specifically in a CAiPi-responsive GSC line. Subsequent analysis of DNA replication identified PARPi-induced increase in origin firing, associated with PARP trapping. GSCs with this origin firing phenotype also exhibited an increase in both under-replicated DNA and re-replication in response to CAiPi, with an increase in chromosomal aberrations and instability. A curated transcriptional signature, based on mechanistic discoveries in CAiPisensitive GSCs, predicted GSC sensitivity and identified populations of GBM patients with poor survival who may respond to CAiPi treatment. In vivo studies demonstrated murine blood brain barrier (BBB) penetration of a PARPi and an ATRi with minimal toxicity, however optimal dosing and scheduling remains a challenge. Conclusions We propose that CAiPi-sensitivity is marked by loss of replication coordination leading to chromosomal damage as cells move through S phase. Additionally, we propose a model whereby under-replication and re-replication can occur due to spatial and temporal uncoupling during S phase. Targeting RS via CAiPi represents a promising therapeutic strategy for selectively targeting recurrence driving GSCs to improve clinical outcomes in GBM

    Patterns and Complexity in Biological Systems: A Study of Sequence Structure and Ontology-based Networks

    Get PDF
    Biological information can be explored at many different levels, with the most basic information encoded in patterns within the DNA sequence. Through molecular level processes, these patterns are capable of controlling the states of genes, resulting in a complex network of interactions between genes. Key features of biological systems can be determined by evaluating properties of this gene regulatory network. More specifically, a network-based approach helps us to understand how the collective behavior of genes corresponds to patterns in genetic function. We combine Chromatin-Immunoprecipitation microarray (ChIP-chip) data with genomic sequence data to determine how DNA sequence works to recruit various proteins. We quantify this information using a value termed "nmer-association.'' "Nmer-association'' measures how strongly individual DNA sequences are associated with a protein in a given ChIP-chip experiment. We also develop the "split-motif'' algorithm to study the underlying structural properties of DNA sequence independent of wet-lab data. The "split-motif'' algorithm finds pairs of DNA motifs which preferentially localize relative to one another. These pairs are primarily composed of known transcription factor binding sites and their co-occurrence is indicative of higher-order structure. This kind of structure has largely been missed in standard motif-finding algorithms despite emerging evidence of the importance of complex regulation. In both simple and complex regulation, two genes that are connected in a regulatory fashion are likely to have shared functions. The Gene Ontology (GO) provides biologists with a controlled terminology with which to describe how genes are associated with function and how those functional terms are related to each other. We introduce a method for processing functional information in GO to produce a gene network. We find that the edges in this network are correlated with known regulatory interactions and that the strength of the functional relationship between two genes can be used as an indicator of how informationally important that link is in the regulatory network. We also investigate the network structure of gene-term annotations found in GO and use these associations to establish an alternate natural way to group the functional terms. These groups of terms are drastically different from the hierarchical structure established by the Gene Ontology and provide an alternative framework with which to describe and predict the functions of experimentally identified groups of genes
    corecore