3,092 research outputs found

    Development of Biclustering Techniques for Gene Expression Data Modeling and Mining

    Get PDF
    The next-generation sequencing technologies can generate large-scale biological data with higher resolution, better accuracy, and lower technical variation than the arraybased counterparts. RNA sequencing (RNA-Seq) can generate genome-scale gene expression data in biological samples at a given moment, facilitating a better understanding of cell functions at genetic and cellular levels. The abundance of gene expression datasets provides an opportunity to identify genes with similar expression patterns across multiple conditions, i.e., co-expression gene modules (CEMs). Genomescale identification of CEMs can be modeled and solved by biclustering, a twodimensional data mining technique that allows clustering of rows and columns in a gene expression matrix, simultaneously. Compared with traditional clustering that targets global patterns, biclustering can predict local patterns. This unique feature makes biclustering very useful when applied to big gene expression data since genes that participate in a cellular process are only active in specific conditions, thus are usually coexpressed under a subset of all conditions. The combination of biclustering and large-scale gene expression data holds promising potential for condition-specific functional pathway/network analysis. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-Seq data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, especially for scRNA-Seq data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. QUBIC2, a novel biclustering algorithm, is designed for large-scale bulk RNA-Seq and single-cell RNA-seq (scRNA-Seq) data analysis. Critical novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression; (ii) adopted the Gaussian mixture distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes; (iii) utilized a Dual strategy to expand the core biclusters, aiming to save dropouts from the background; and (iv) developed a statistical framework to evaluate the significances of all the identified biclusters. Method validation on comprehensive data sets suggests that QUBIC2 had superior performance in functional modules detection and cell type classification. The applications of temporal and spatial data demonstrated that QUBIC2 could derive meaningful biological information from scRNA-Seq data. Also presented in this dissertation is QUBICR. This R package is characterized by an 82% average improved efficiency compared to the source C code of QUBIC. It provides a set of comprehensive functions to facilitate biclustering-based biological studies, including the discretization of expression data, query-based biclustering, bicluster expanding, biclusters comparison, heatmap visualization of any identified biclusters, and co-expression networks elucidation. In the end, a systematical summary is provided regarding the primary applications of biclustering for biological data and more advanced applications for biomedical data. It will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency

    Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets.

    Get PDF
    Multi-omics studies promise the improved characterization of biological processes across molecular layers. However, methods for the unsupervised integration of the resulting heterogeneous data sets are lacking. We present Multi-Omics Factor Analysis (MOFA), a computational method for discovering the principal sources of variation in multi-omics data sets. MOFA infers a set of (hidden) factors that capture biological and technical sources of variability. It disentangles axes of heterogeneity that are shared across multiple modalities and those specific to individual data modalities. The learnt factors enable a variety of downstream analyses, including identification of sample subgroups, data imputation and the detection of outlier samples. We applied MOFA to a cohort of 200 patient samples of chronic lymphocytic leukaemia, profiled for somatic mutations, RNA expression, DNA methylation and ex vivo drug responses. MOFA identified major dimensions of disease heterogeneity, including immunoglobulin heavy-chain variable region status, trisomy of chromosome 12 and previously underappreciated drivers, such as response to oxidative stress. In a second application, we used MOFA to analyse single-cell multi-omics data, identifying coordinated transcriptional and epigenetic changes along cell differentiation

    Early downregulation of hsa-miR-144-3p in serum from drug-naïve Parkinson’s disease patients

    Get PDF
    open101siThis work was supported by the Horizon 2020 Framework Programme (Grant number 634821, PROPAG-AGING).Advanced age represents one of the major risk factors for Parkinson’s Disease. Recent biomedical studies posit a role for microRNAs, also known to be remodelled during ageing. However, the relationship between microRNA remodelling and ageing in Parkinson’s Disease, has not been fully elucidated. Therefore, the aim of the present study is to unravel the relevance of microRNAs as biomarkers of Parkinson’s Disease within the ageing framework. We employed Next Generation Sequencing to profile serum microRNAs from samples informative for Parkinson’s Disease (recently diagnosed, drug-naïve) and healthy ageing (centenarians) plus healthy controls, age-matched with Parkinson’s Disease patients. Potential microRNA candidates markers, emerging from the combination of differential expression and network analyses, were further validated in an independent cohort including both drug-naïve and advanced Parkinson’s Disease patients, and healthy siblings of Parkinson’s Disease patients at higher genetic risk for developing the disease. While we did not find evidences of microRNAs co-regulated in Parkinson’s Disease and ageing, we report that hsa-miR-144-3p is consistently down-regulated in early Parkinson’s Disease patients. Moreover, interestingly, functional analysis revealed that hsa-miR-144-3p is involved in the regulation of coagulation, a process known to be altered in Parkinson’s Disease. Our results consistently show the down-regulation of hsa-mir144-3p in early Parkinson’s Disease, robustly confirmed across a variety of analytical and experimental analyses. These promising results ask for further research to unveil the functional details of the involvement of hsa-mir144-3p in Parkinson’s Disease.openZago E.; Dal Molin A.; Dimitri G.M.; Xumerle L.; Pirazzini C.; Bacalini M.G.; Maturo M.G.; Azevedo T.; Spasov S.; Gomez-Garre P.; Perinan M.T.; Jesus S.; Baldelli L.; Sambati L.; Calandra Buonaura G.; Garagnani P.; Provini F.; Cortelli P.; Mir P.; Trenkwalder C.; Mollenhauer B.; Franceschi C.; Lio P.; Nardini C.; Adarmes-Gomez A.; Azevedo T.; Bacalini M.G.; Baldelli L.; Bartoletti-Stella A.; Bhatia K.P.; Marta B.-T.; Boninsegna C.; Broli M.; Dolores B.-R.; Calandra-Buonaura G.; Capellari S.; Carrion-Claro M.; Cilea R.; Clayton R.; Cortelli P.; Molin A.D.; De Luca S.; De Massis P.; Dimitri G.M.; Doykov I.; Escuela-Martin R.; Fabbri G.; Franceschi C.; Gabellini A.; Garagnani P.; Giuliani C.; Gomez-Garre P.; Guaraldi P.; Hagg S.; Hallqvist J.; Halsband C.; Heywood W.; Houlden H.; Huertas I.; Jesus S.; Jylhava J.; Labrador-Espinosa M.A.; Licari C.; Lio P.; Luchinat C.; Macias D.; Macri S.; Magrinelli F.; Rodriguez J.F.M.; Massimo D.; Maturo M.G.; Mengozzi G.; Meoni G.; Mignani F.; Milazzo M.; Mills K.; Mir P.; Mollenhauer B.; Nardini C.; Nassetti S.A.; Pedersen N.L.; Perinan-Tocino M.T.; Pirazzini C.; Provini F.; Ravaioli F.; Sala C.; Sambati L.; Scaglione C.L.M.; Schade S.; Schreglmann S.; Spasov S.; Strom S.; Tejera-Parrado C.; Tenori L.; Trenkwalder C.; Turano P.; Valzania F.; Ortega R.V.; Williams D.; Xumerle L.; Zago E.Zago E.; Dal Molin A.; Dimitri G.M.; Xumerle L.; Pirazzini C.; Bacalini M.G.; Maturo M.G.; Azevedo T.; Spasov S.; Gomez-Garre P.; Perinan M.T.; Jesus S.; Baldelli L.; Sambati L.; Calandra Buonaura G.; Garagnani P.; Provini F.; Cortelli P.; Mir P.; Trenkwalder C.; Mollenhauer B.; Franceschi C.; Lio P.; Nardini C.; Adarmes-Gomez A.; Azevedo T.; Bacalini M.G.; Baldelli L.; Bartoletti-Stella A.; Bhatia K.P.; Marta B.-T.; Boninsegna C.; Broli M.; Dolores B.-R.; Calandra-Buonaura G.; Capellari S.; Carrion-Claro M.; Cilea R.; Clayton R.; Cortelli P.; Molin A.D.; De Luca S.; De Massis P.; Dimitri G.M.; Doykov I.; Escuela-Martin R.; Fabbri G.; Franceschi C.; Gabellini A.; Garagnani P.; Giuliani C.; Gomez-Garre P.; Guaraldi P.; Hagg S.; Hallqvist J.; Halsband C.; Heywood W.; Houlden H.; Huertas I.; Jesus S.; Jylhava J.; Labrador-Espinosa M.A.; Licari C.; Lio P.; Luchinat C.; Macias D.; Macri S.; Magrinelli F.; Rodriguez J.F.M.; Massimo D.; Maturo M.G.; Mengozzi G.; Meoni G.; Mignani F.; Milazzo M.; Mills K.; Mir P.; Mollenhauer B.; Nardini C.; Nassetti S.A.; Pedersen N.L.; Perinan-Tocino M.T.; Pirazzini C.; Provini F.; Ravaioli F.; Sala C.; Sambati L.; Scaglione C.L.M.; Schade S.; Schreglmann S.; Spasov S.; Strom S.; Tejera-Parrado C.; Tenori L.; Trenkwalder C.; Turano P.; Valzania F.; Ortega R.V.; Williams D.; Xumerle L.; Zago E

    Early downregulation of hsa-miR-144-3p in serum from drug-naïve Parkinson’s disease patients

    Get PDF
    Advanced age represents one of the major risk factors for Parkinson’s Disease. Recent biomedical studies posit a role for microRNAs, also known to be remodelled during ageing. However, the relationship between microRNA remodelling and ageing in Parkinson’s Disease, has not been fully elucidated. Therefore, the aim of the present study is to unravel the relevance of microRNAs as biomarkers of Parkinson’s Disease within the ageing framework. We employed Next Generation Sequencing to profile serum microRNAs from samples informative for Parkinson’s Disease (recently diagnosed, drug-naïve) and healthy ageing (centenarians) plus healthy controls, age-matched with Parkinson’s Disease patients. Potential microRNA candidates markers, emerging from the combination of differential expression and network analyses, were further validated in an independent cohort including both drug-naïve and advanced Parkinson’s Disease patients, and healthy siblings of Parkinson’s Disease patients at higher genetic risk for developing the disease. While we did not find evidences of microRNAs co-regulated in Parkinson’s Disease and ageing, we report that hsa-miR-144-3p is consistently down-regulated in early Parkinson’s Disease patients. Moreover, interestingly, functional analysis revealed that hsa-miR-144-3p is involved in the regulation of coagulation, a process known to be altered in Parkinson’s Disease. Our results consistently show the down-regulation of hsa-mir144-3p in early Parkinson’s Disease, robustly confirmed across a variety of analytical and experimental analyses. These promising results ask for further research to unveil the functional details of the involvement of hsa-mir144-3p in Parkinson’s Disease

    PTRF/Cavin-1 and MIF Proteins Are Identified as Non-Small Cell Lung Cancer Biomarkers by Label-Free Proteomics

    Get PDF
    With the completion of the human genome sequence, biomedical sciences have entered in the “omics” era, mainly due to high-throughput genomics techniques and the recent application of mass spectrometry to proteomics analyses. However, there is still a time lag between these technological advances and their application in the clinical setting. Our work is designed to build bridges between high-performance proteomics and clinical routine. Protein extracts were obtained from fresh frozen normal lung and non-small cell lung cancer samples. We applied a phosphopeptide enrichment followed by LC-MS/MS. Subsequent label-free quantification and bioinformatics analyses were performed. We assessed protein patterns on these samples, showing dozens of differential markers between normal and tumor tissue. Gene ontology and interactome analyses identified signaling pathways altered on tumor tissue. We have identified two proteins, PTRF/cavin-1 and MIF, which are differentially expressed between normal lung and non-small cell lung cancer. These potential biomarkers were validated using western blot and immunohistochemistry. The application of discovery-based proteomics analyses in clinical samples allowed us to identify new potential biomarkers and therapeutic targets in non-small cell lung cancer

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    Rare and common genetic determinants of metabolic individuality and their effects on human health

    Get PDF
    Garrod’s concept of ‘chemical individuality’ has contributed to comprehension of the molecular origins of human diseases. Untargeted high-throughput metabolomic technologies provide an in-depth snapshot of human metabolism at scale. We studied the genetic architecture of the human plasma metabolome using 913 metabolites assayed in 19,994 individuals and identified 2,599 variant–metabolite associations (P < 1.25 × 10−11) within 330 genomic regions, with rare variants (minor allele frequency ≤ 1%) explaining 9.4% of associations. Jointly modeling metabolites in each region, we identified 423 regional, co-regulated, variant–metabolite clusters called genetically influenced metabotypes. We assigned causal genes for 62.4% of these genetically influenced metabotypes, providing new insights into fundamental metabolite physiology and clinical relevance, including metabolite-guided discovery of potential adverse drug effects (DPYD and SRD5A2). We show strong enrichment of inborn errors of metabolism-causing genes, with examples of metabolite associations and clinical phenotypes of non-pathogenic variant carriers matching characteristics of the inborn errors of metabolism. Systematic, phenotypic follow-up of metabolite-specific genetic scores revealed multiple potential etiological relationships

    Computational Integrative Models for Cellular Conversion: Application to Cellular Reprogramming and Disease Modeling

    Get PDF
    The groundbreaking identification of only four transcription factors that are able to induce pluripotency in any somatic cell upon perturbation stimulated the discovery of copious amounts of instructive factors triggering different cellular conversions. Such conversions are highly significant to regenerative medicine with its ultimate goal of replacing or regenerating damaged and lost cells. Precise directed conversion of damaged cells into healthy cells offers the tantalizing prospect of promoting regeneration in situ. In the advent of high-throughput sequencing technologies, the distinct transcriptional and accessible chromatin landscapes of several cell types have been characterized. This characterization provided clear evidences for the existence of cell type specific gene regulatory networks determined by their distinct epigenetic landscapes that control cellular phenotypes. Further, these networks are known to dynamically change during the ectopic expression of genes initiating cellular conversions and stabilize again to represent the desired phenotype. Over the years, several computational approaches have been developed to leverage the large amounts of high-throughput datasets for a systematic prediction of instructive factors that can potentially induce desired cellular conversions. To date, the most promising approaches rely on the reconstruction of gene regulatory networks for a panel of well-studied cell types relying predominantly on transcriptional data alone. Though useful, these methods are not designed for newly identified cell types as their frameworks are restricted only to the panel of cell types originally incorporated. More importantly, these approaches rely majorly on gene expression data and cannot account for the cell type specific regulations modulated by the interplay of the transcriptional and epigenetic landscape. In this thesis, a computational method for reconstructing cell type specific gene regulatory networks is proposed that aims at addressing the aforementioned limitations of current approaches. This method integrates transcriptomics, chromatin accessibility assays and available prior knowledge about gene regulatory interactions for predicting instructive factors that can potentially induce desired cellular conversions. Its application to the prioritization of drugs for reverting pathologic phenotypes and the identification of instructive factors for inducing the cellular conversion of adipocytes into osteoblasts underlines the potential to assist in the discovery of novel therapeutic interventions
    corecore