23 research outputs found

    Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology

    Get PDF
    Non-negative matrix factorization is a useful tool for reducing the dimension of large datasets. This work considers simultaneous non-negative matrix factorization of multiple sources of data. In particular, we perform the first study that involves more than two datasets. We discuss the algorithmic issues required to convert the approach into a practical computational tool and apply the technique to new gene expression data quantifying the molecular changes in four tissue types due to different dosages of an experimental panPPAR agonist in mouse. This study is of interest in toxicology because, whilst PPARs form potential therapeutic targets for diabetes, it is known that they can induce serious side-effects. Our results show that the practical simultaneous non-negative matrix factorization developed here can add value to the data analysis. In particular, we find that factorizing the data as a single object allows us to distinguish between the four tissue types, but does not correctly reproduce the known dosage level groups. Applying our new approach, which treats the four tissue types as providing distinct, but related, datasets, we find that the dosage level groups are respected. The new algorithm then provides separate gene list orderings that can be studied for each tissue type, and compared with the ordering arising from the single factorization. We find that many of our conclusions can be corroborated with known biological behaviour, and others offer new insights into the toxicological effects. Overall, the algorithm shows promise for early detection of toxicity in the drug discovery process

    Cancer Subtyping Detection using Biomarker Discovery in Multi-Omics Tensor Datasets

    Get PDF
    This thesis begins with a thorough review of research trends from 2015 to 2022, examining the challenges and issues related to biomarker discovery in multi-omics datasets. The review covers areas of application, proposed methodologies, evaluation criteria used to assess performance, as well as limitations and drawbacks that require further investigation and improvement. This comprehensive overview serves to provide a deeper understanding of the current state of research in this field and the opportunities for future research. It will be particularly useful for those who are interested in this area of study and seeking to expand their knowledge. In the second part of this thesis, a novel methodology is proposed for the identification of significant biomarkers in a multi-omics colon cancer dataset. The integration of clinical features with biomarker discovery has the potential to facilitate the early identification of mortality risk and the development of personalized therapies for a range of diseases, including cancer and stroke. Recent advancements in “omics� technologies have opened up new avenues for researchers to identify disease biomarkers through system-level analysis. Machine learning methods, particularly those based on tensor decomposition techniques, have gained popularity due to the challenges associated with integrative analysis of multi-omics data owing to the complexity of biological systems. Despite extensive efforts towards discovering disease-associated biomolecules by analyzing data from various “omics� experiments, such as genomics, transcriptomics, and metabolomics, the poor integration of diverse forms of 'omics' data has made the integrative analysis of multi-omics data a daunting task. Our research includes ANOVA simultaneous component analysis (ASCA) and Tucker3 modeling to analyze a multivariate dataset with an underlying experimental design. By comparing the spaces spanned by different model components we showed how the two methods can be used for confirmatory analysis and provide complementary information. we demonstrated the novel use of ASCA to analyze the residuals of Tucker3 models to find the optimum one. Increasing the model complexity to more factors removed the last remaining ASCA detectable structure in the residuals. Bootstrap analysis of the core matrix values of the Tucker3 models used to check that additional triads of eigenvectors were needed to describe the remaining structure in the residuals. Also, we developed a new simple, novel strategy for aligning Tucker3 bootstrap models with the Tucker3 model of the original data so that eigenvectors of the three modes, the order of the values in the core matrix, and their algebraic signs match the original Tucker3 model without the need for complicated bookkeeping strategies or performing rotational transformations. Additionally, to avoid getting an overparameterized Tucker3 model, we used the bootstrap method to determine 95% confidence intervals of the loadings and core values. Also, important variables for classification were identified by inspection of loading confidence intervals. The experimental results obtained using the colon cancer dataset demonstrate that our proposed methodology is effective in improving the performance of biomarker discovery in a multi-omics cancer dataset. Overall, our study highlights the potential of integrating multi-omics data with machine learning methods to gain deeper insights into the complex biological mechanisms underlying cancer and other diseases. The experimental results using NIH colon cancer dataset demonstrate that the successful application of our proposed methodology in cancer subtype classification provides a foundation for further investigation into its utility in other disease areas

    Structure-revealing data fusion

    Get PDF
    BACKGROUND: Analysis of data from multiple sources has the potential to enhance knowledge discovery by capturing underlying structures, which are, otherwise, difficult to extract. Fusing data from multiple sources has already proved useful in many applications in social network analysis, signal processing and bioinformatics. However, data fusion is challenging since data from multiple sources are often (i) heterogeneous (i.e., in the form of higher-order tensors and matrices), (ii) incomplete, and (iii) have both shared and unshared components. In order to address these challenges, in this paper, we introduce a novel unsupervised data fusion model based on joint factorization of matrices and higher-order tensors. RESULTS: While the traditional formulation of coupled matrix and tensor factorizations modeling only shared factors fails to capture the underlying structures in the presence of both shared and unshared factors, the proposed data fusion model has the potential to automatically reveal shared and unshared components through modeling constraints. Using numerical experiments, we demonstrate the effectiveness of the proposed approach in terms of identifying shared and unshared components. Furthermore, we measure a set of mixtures with known chemical composition using both LC-MS (Liquid Chromatography - Mass Spectrometry) and NMR (Nuclear Magnetic Resonance) and demonstrate that the structure-revealing data fusion model can (i) successfully capture the chemicals in the mixtures and extract the relative concentrations of the chemicals accurately, (ii) provide promising results in terms of identifying shared and unshared chemicals, and (iii) reveal the relevant patterns in LC-MS by coupling with the diffusion NMR data. CONCLUSIONS: We have proposed a structure-revealing data fusion model that can jointly analyze heterogeneous, incomplete data sets with shared and unshared components and demonstrated its promising performance as well as potential limitations on both simulated and real data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2105-15-239) contains supplementary material, which is available to authorized users

    Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features.

    Get PDF
    The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia

    Cytokine Networks And Immunosurveillance In Cancer

    Get PDF
    The cytokine milieu in the tumor microenvironment plays a key role in modulating the immune response either in favor of or against tumorigenesis. For many tumors, this complex network of cytokine and immune interactions represent a formidable means of escape from immune surveillance. These cytokine networks are particularly important in pancreatic ductal adenocarcinoma (PDA), where a prominent infiltration of immunosuppressive immune populations could be found. Myeloid-derived suppressor cells (MDSCs) have previously been shown to be potent suppressors of anti-tumor immunity in PDA, but the cytokine networks regulating their recruitment to the tumor microenvironment remain incompletely understood. Here, I found that CXCR2 ligand expression is specifically correlated with enrichment of the granulocytic subset of MDSCs (G-MDSCs) in human PDAs. Using a genetically engineered mouse model of PDA, I showed that CXCR2 is required for G-MDSC trafficking to the tumor microenvironment, but not necessary for their systemic differentiation and expansion. The specific lack of G-MDSCs in the tumor microenvironment led to a T cell-dependent inhibition of tumor growth. Expression of CXCR2 ligands in PDA tumor cells can be potently induced by NF-κB activation. These findings describe a cytokine network in PDA where inflammatory signals in the tumor microenvironment drive the expression of CXCR2 ligands and the recruitment of immunosuppressive G-MDSCs. To discover other potentially important cytokine networks, I developed a novel analysis pipeline to reconstruct and compare cytokine networks from whole tumor gene expression data. Using expression of cytolytic genes as a gauge for anti-tumor immune activity, I found that PDA patients with high cytolytic activity have a slight survival advantage compared to those with lower activity. While macrophages were the most influential in tumors with low cytolytic activity, tumors with high cytolytic activity were characterized by increased activity of NK cells, recruitment of B cells, and increased importance of CD8 T cells, CD4 T helper cells, and B cells, among others. I further highlighted the cytokines that might be associated with these immune populations. Therefore, my analysis identified potentially important components of the cytokine network associated with high and low cytolytic activity. Collectively, the work in this thesis suggests that cytokine networks are crucial for maintaining an immunosuppressive microenvironment in cancer. Furthermore, disrupting key components of these networks can tip the balance in favor of cancer immunosurveillance

    INTEGRATED GENOMIC MARKERS FOR CHEMOTHERAPEUTICS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Identification of new candidate genes for germline predisposition to familial colorectal cancer using somatic mutational profiling

    Get PDF
    [eng] Colorectal cancer (CRC) is one of the malignant neoplasms with higher incidence and mortality in Spain, Europe and worldwide. As a complex disease, both environmental and genetic factors influence CRC predisposition. Up to 35% of CRC patients present familial aggregation for the disease, whereas only around 2-8% of cases are linked to a well-known hereditary syndrome associated to pathogenic germline alterations in specific genes, namely APC, MUTYH, POLE, POLD1 or the DNA mismatch repair genes. During last years, next generation sequencing (NGS) techniques such as whole exome sequencing (WES) have been used to address this gap of missing heritability. Characterization of somatic mutational profiles, performed by the application of NGS to both germline and tumor DNA, has also been recently established as a powerful tool to identify novel genes linked to CRC predisposition. However, although some bioinformatic packages have been developed to address this analysis, it remains inaccessible for a substantial proportion of the scientific community. Accordingly, the main purpose of this doctoral thesis was to identify new genes involved in germline predisposition to familial CRC, by using an integrated germline-tumor WES analysis and somatic mutational profiling, as well as facilitating the application of these genomic analyses to the scientific community. As a first step, a bioinformatic tool to deal with somatic mutational profiling was developed. Shiny framework was used to build MuSiCa, a user-friendly web application freely accessible and potentially useful for non-specialized researchers. Tumor mutational burden calculation and mutational signature refitting analysis according to the information present in COSMIC database is available, as well as different options for sample classification through clustering and principal component analysis. Subsequently, an integrated germline-tumor analysis was implemented in a cohort of 18 familial CRC unrelated patients. WES data of both germline and tumor DNA was available, allowing the identification of new potential tumor suppressor genes according to Knudson’s two-hit hypothesis. Benefitting from the development of MuSiCa application, somatic mutational profiling was also analyzed, uncovering five hypermutated samples. An enrichment of DNA repair-associated genes was found, as well as some genes previously linked to predisposition syndromes to other cancer types. BRCA2, BLM, ERCC2, RECQL, REV3L and RIF1 were found as the most promising candidate genes for germline CRC predisposition. Interestingly, a germline mutation was found in the DNA repair gene RECQL in a patient with one of the hypermutated tumors, reinforcing the putative role of this gene in hereditary CRC. These findings could be helpful in clinical practice improving genetic counseling in the affected families.[spa] El cáncer colorrectal (CCR) es una de las neoplasias con mayor incidencia y mortalidad en España y el mundo. Aunque un 35% de los pacientes presentan agregación familiar, sólo un 2-8% se asocia con un síndrome hereditario conocido, causado por mutaciones germinales en genes como APC, MUTYH, POLE, POLD1 o los genes del sistema de reparación del ADN por mal apareamiento de bases. En los últimos años, las técnicas de secuenciación de nueva generación (SNG), como la secuenciación del exoma completo (SEC), han sido utilizadas para el descubrimiento de nuevos genes implicados en la predisposición al CCR. La caracterización de los perfiles mutacionales somáticos, aplicando SNG al ADN germinal y tumoral, también se ha utilizado recientemente en este proceso. Sin embargo, aunque se han desarrollado algunos paquetes bioinformáticos para su análisis, todavía permanece inaccesible para una gran parte de la comunidad científica. En consecuencia, el objetivo principal de esta tesis doctoral ha sido el de identificar nuevos genes implicados en la predisposición germinal al CCR familiar, utilizando un análisis de SEC germinal-tumoral y caracterización mutacional somática, así como facilitar la aplicación de estos análisis genómicos a la comunidad científica. En primer lugar, se llevó a cabo el desarrollo de una herramienta bioinformática denominada Mutational Signatures in Cancer (MuSiCa), una aplicación web de manejo sencillo y acceso libre desarrollada a través de la plataforma Shiny, que permite el cálculo de la carga mutacional tumoral y la caracterización de las firmas mutacionales según la información disponible en la base de datos COSMIC. Posteriormente, se implementó un análisis integrado de SEC germinal-tumoral en una cohorte de 18 pacientes de CCR familiar, complementado con una caracterización mutacional somática, gracias al desarrollo de MuSiCa. Se detectaron cinco tumores hipermutados, así como un enriquecimiento de mutaciones germinales en genes involucrados previamente en síndromes de predisposición a otros tipos de cáncer y a la reparación del ADN. Los genes BRCA2, BLM, ERCC2, RECQL, REV3L y RIF1 fueron priorizados como los más prometedores de cara a la predisposición al CCR. Estos descubrimientos podrían ser de utilidad en la práctica clínica, mejorando el consejo genético en las familias afectadas

    Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features

    Get PDF
    The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom’s 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia
    corecore