6 research outputs found

    Integrative Pathway Analysis Pipeline For Mirna And Mrna Data

    Get PDF
    The identification of pathways that are involved in a particular phenotype helps us understand the underlying biological processes. Traditional pathway analysis techniques aim to infer the impact on individual pathways using only mRNA levels. However, recent studies showed that gene expression alone is unable to capture the whole picture of biological phenomena. At the same time, MicroRNAs (miRNAs) are newly discovered gene regulators that have shown to play an important role in diagnosis, and prognosis for different types of diseases. Current pathway analysis techniques do not take miRNAs into consideration. In this project, we investigate the effect of integrating miRNA and mRNA expression in pathway analysis. In order to analyze biological pathways using miRNA expression data, we developed a novel method that augments KEGG pathways with microRNAs targeting genes. To validate our method, we analyzed nine GEO datasets. We also performed the analyses using just mRNA as well as using the integrative state-of-the-art method (microGraphite) to compare the results. In each case, we monitored the position of the pathway describing the given condition. We observed that our method outperforms the state-of-the-art approach

    Horizontal And Vertical Integration Of Bio-Molecular Data

    Get PDF
    Modern biomedical research lies at the crossroads of data gathering, interpretation, and hypothesis testing. Due to noise, study bias, or too small changes in biological signals between disease and healthy, individual studies often fail to identify the true phenomenon. Data integration is the key to obtaining the power needed to pinpoint the biological mechanisms of disease states. Given this, we tried to make important contributions in both horizontal and vertical integration of high-throughput data; the former is meta-analysis of independent studies, while the latter is the integration of multi-omics data. For horizontal meta-analysis, we developed two frameworks: DANUBE and the bi-level meta-analysis. In DANUBE, we pointed out that most pathway analysis approaches make wrong assumptions of bio-molecular data which leads to non-uniformity of p-values under the null hypothesis. DANUBE proposed a way to correct the biased p-values before combining them using the Central Limit Theorem. In the bi-level meta-analysis, we added another level of meta-analysis to make better use of the available number of samples within individual studies. Both techniques were validated using thousands of real samples obtained from independent studies related to three human diseases, Alzheimer\u27s disease, acute myeloid leukemia, and type II diabetes mellitus. These frameworks outperformed classical approaches to consistently identify pathways that are relevant to the given phenotypes. Via extensive simulation studies, we also demonstrate that the proposed techniques are sufficiently general to be applied outside the scope of biomedical research. For vertical integrative analysis, we integrated transcriptomics, epigenomics, and non-coding RNA data to identify disease subtypes. Successful subtyping of complex diseases can lead to identifying biomarkers and targets of new drugs. We developed a perturbation clustering to accurately subtype patients using high-dimensional gene expression data. The framework was also extended to combine complementary information available in multi-omics data, by adapting techniques in network partitioning and cluster ensembles. The algorithm was validated on thousands of real cancer samples, using mRNA, methylation, and microRNA data available on Gene Expression Omnibus, the Broad Institute, and the Cancer Genome Atlas. This simultaneous subtyping approach accurately identifies known cancer subtypes and predicts the survival of novel subgroups of patients. We also developed a meta-analysis framework that combines two orthogonal types of data integration: horizontal and vertical meta-analysis. Integrative analyses of omics data often require all data types to be available for each individual patient. This reduces their practical availability since sample-matched data is relatively rare and difficult or expensive to obtain. We proposed an orthogonal meta-analysis framework that is able to overcome the sample-matched data bottleneck, by successfully integrating datasets of different types generated in independent laboratories from different sets of patients. The proposed framework was validated using 1,471 samples from 15 mRNA and 14 miRNA expression datasets related to two human cancers, colorectal cancer and pancreatic cancer. The orthogonal approach reliably identifies signaling pathways that are impacted by the two cancer diseases. While validated in the context of pathway analysis, the framework can be modified to adapt to other domains or applications

    A Multi-Cohort and Multi-Omics Meta-Analysis Framework to Identify Network-Based Gene Signatures

    Get PDF
    Although massive amounts of condition-specific molecular profiles are being accumulated in public repositories every day, meaningful interpretation of these data remains a major challenge. In an effort to identify the biomarkers that describe the key biological phenomena for a given condition, several approaches have been developed over the past few years. However, the majority of these approaches either (i) do not consider the known intermolecular interactions, or (ii) do not integrate molecular data of multiple types (e.g., genomics, transcriptomics, proteomics, epigenomics, etc.), and thus potentially fail to capture the true biological changes responsible for complex diseases (e.g., cancer). In addition, these approaches often ignore the heterogeneity and study bias present in independent molecular cohorts. In this manuscript, we propose a novel multi-cohort and multi-omics meta-analysis framework that overcomes all three limitations mentioned above in order to identify robust molecular subnetworks that capture the key dynamic nature of a given biological condition. Our framework integrates multiple independent gene expression studies, unmatched DNA methylation studies, and protein-protein interactions to identify methylation-driven subnetworks. We demonstrate the proposed framework by constructing subnetworks related to two complex diseases: glioblastoma and low-grade gliomas. We validate the identified subnetworks by showing their ability to predict patients' clinical outcome on multiple independent validation cohorts

    A Single-Subject Method to Detect Pathways Enriched With Alternatively Spliced Genes

    Get PDF
    RNA-Sequencing data offers an opportunity to enable precision medicine, but most methods rely on gene expression alone. To date, no methodology exists to identify and interpret alternative splicing patterns within pathways for an individual patient. This study develops methodology and conducts computational experiments to test the hypothesis that pathway aggregation of subject-specific alternatively spliced genes (ASGs) can inform upon disease mechanisms and predict survival. We propose the N-of-1-pathways Alternatively Spliced (N1PAS) method that takes an individual patient’s paired-sample RNA-Seq isoform expression data (e.g., tumor vs. non-tumor, before-treatment vs. during-therapy) and pathway annotations as inputs. N1PAS quantifies the degree of alternative splicing via Hellinger distances followed by two-stage clustering to determine pathway enrichment. We provide a clinically relevant “odds ratio” along with statistical significance to quantify pathway enrichment. We validate our method in clinical samples and find that our method selects relevant pathways (p < 0.05 in 4/6 data sets). Extensive Monte Carlo studies show N1PAS powerfully detects pathway enrichment of ASGs while adequately controlling false discovery rates. Importantly, our studies also unveil highly heterogeneous single-subject alternative splicing patterns that cohort-based approaches overlook. Finally, we apply our patient-specific results to predict cancer survival (FDR < 20%) while providing diagnostics in pursuit of translating transcriptome data into clinically actionable information. Software available at https://github.com/grizant/n1pas/tree/master

    Novel Techniques for Single-cell RNA Sequencing Data Imputation and Clustering

    Get PDF
    Advances in single-cell technologies have shifted genomics research from the analysis of bulk tissues toward a comprehensive characterization of individual cells. These cutting-edge approaches enable the in-depth analysis of individual cells, unveiling the remarkable heterogeneity and complexity of cellular systems. By unraveling the unique signatures and functions of distinct cell types, single-cell technologies have not only deepened our understanding of fundamental biological processes but also unlocked new avenues for disease diagnostics and therapeutic interventions.The applications of single-cell technologies extend beyond basic research, with significant implications for precision medicine, drug discovery, and regenerative medicine. By capturing the cellular heterogeneity within tumors, these methods have shed light on the mechanisms of tumor evolution, metastasis, and therapy resistance. Additionally, they have facilitated the identification of rare cell populations with specialized functions, such as stem cells and tissue-resident immune cells, which hold great promise for cell-based therapies.However, one of the major challenges in analyzing scRNA-seq data is the prevalence of dropouts, which are instances where gene expression is not detected despite being present in the cell. Dropouts occur due to technical limitations and can introduce excessive noise into the data, obscuring the true biological signals. As a result, imputation methods are used to estimate missing values and reduce the impact of dropouts on downstream analyses. Furthermore, the high-dimensionality of scRNA-seq data presents additional challenges in effectively partitioning cell populations. Thus, robust computational approaches are required to overcome these challenges and extract meaningful biological insights from single-cell data.There have been numerous imputation and clustering methods developed specifically to address the unique challenges associated with scRNA-seq data analysis. These methods aim to reduce the impact of dropouts and high dimensionality, allowing for accurate cell population partitioning and the discovery of meaningful biological insights. While these methods have unquestionably advanced the field of single-cell transcriptomics, they are not without limitations. Some methods may be computationally intensive, resulting in scalability issues with large datasets, whereas others may introduce biases or overfit the data, potentially affecting the accuracy of subsequent analyses. Furthermore, the performance of these methods can vary depending on the dataset's complexity and heterogeneity. As a result, ongoing research is required to improve existing methodologies and create new algorithms that address these limitations while retaining robustness and accuracy in scRNA-seq data analysis.In this work, we propose three imputation approaches which incorporate with statistical and deep learning framework. We robustly reconstruct the gene expression matrix, effectively mitigating dropout effects and reducing noise. This results in the enhanced recovery of true biological signals from scRNA-seq data and leveraging transcriptomic profiles of single cells. In addition, we introduce a clustering method, which exploits the scRNA-seq data to identify cellular subpopulations. Our method employs a combination of dimensionality reduction and network fusion algorithms to generate a cell similarity graph. This approach accounts for both local and global structure within the data, enabling the discovery of rare and previously unidentified cell populations.We plan to assess the imputation and clustering methods through rigorous benchmarking on simulated and more than 30 real scRNA-seq datasets against existing state-of-the-art techniques. We will show that the imputed data generated from our method can enhance the quality of downstream analyses. Also, we demonstrate that our clustering algorithm is efficient in accurately identifying the cells populations and capable of analyzing big datasets.In conclusion, this thesis propose an alternative approaches to advance current state of scRNA-seq data analysis by developing innovative imputation and clustering methods that enable a more comprehensive and accurate characterization of cellular subpopulations. These advancements potentially have broad applicability in diverse research fields, including developmental biology, immunology, and oncology, where understanding cellular heterogeneity is crucial

    Qualitative Change Detection Approach For Preventive Therapies

    Get PDF
    Currently, most diseases are diagnosed only after disease-associated changes have occurred. In this PhD dissertation, we propose a paradigm shift from treating the disease to maintaining the healthy state. The proposed approach is able to identify when systemic qualitative changes in biological systems happen, thus opening the possibility of therapeutic interventions before the occurrence of symptoms. The change detection method exploits knowledge from biological networks and longitudinal data using a system impact analysis approach. This approach is validated on eight datasets, for seven different model organisms and eight biological phenomena. On these data, our proposed method performs well, consistently identifying the qualitative change in each dataset. Most importantly, the method accurately detected the transition from the control stage (benign) to the early stage of hepatocellular carcinoma on an eight-stage disease dataset. Knowing when a transition (qualitative change) from healthy to disease occurs may help preserve the healthy state. We also propose a novel analysis approach for metabolic pathway analysis that uses an impact analysis approach and leverages the stoichiometry of bio-chemical reactions to identify which pathways are significantly disrupted by the change in metabolite levels in disease samples versus healthy controls. Our approach outperforms the over-representation approach when evaluated on simulated data. We applied our proposed method to biological experiment data that compares samples from pregnant women to non-pregnant control samples. Our method was able to identify biologically relevant results on real high-throughput data better than the classical approach. In summary, we developed two novel methods for the analysis of high-throughput biological data, gene expression and metabolite concentration, respectively. The proposed methods can be adapted to work together in order to capture relevant complementary information stored in time-course datasets for gene expression or metabolite levels that may available for complex diseases in order to identify when a qualitative change happens, before the physiological onset of the disease
    corecore