2 research outputs found

    Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.

    Get PDF
    As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways

    Horizontal And Vertical Integration Of Bio-Molecular Data

    Get PDF
    Modern biomedical research lies at the crossroads of data gathering, interpretation, and hypothesis testing. Due to noise, study bias, or too small changes in biological signals between disease and healthy, individual studies often fail to identify the true phenomenon. Data integration is the key to obtaining the power needed to pinpoint the biological mechanisms of disease states. Given this, we tried to make important contributions in both horizontal and vertical integration of high-throughput data; the former is meta-analysis of independent studies, while the latter is the integration of multi-omics data. For horizontal meta-analysis, we developed two frameworks: DANUBE and the bi-level meta-analysis. In DANUBE, we pointed out that most pathway analysis approaches make wrong assumptions of bio-molecular data which leads to non-uniformity of p-values under the null hypothesis. DANUBE proposed a way to correct the biased p-values before combining them using the Central Limit Theorem. In the bi-level meta-analysis, we added another level of meta-analysis to make better use of the available number of samples within individual studies. Both techniques were validated using thousands of real samples obtained from independent studies related to three human diseases, Alzheimer\u27s disease, acute myeloid leukemia, and type II diabetes mellitus. These frameworks outperformed classical approaches to consistently identify pathways that are relevant to the given phenotypes. Via extensive simulation studies, we also demonstrate that the proposed techniques are sufficiently general to be applied outside the scope of biomedical research. For vertical integrative analysis, we integrated transcriptomics, epigenomics, and non-coding RNA data to identify disease subtypes. Successful subtyping of complex diseases can lead to identifying biomarkers and targets of new drugs. We developed a perturbation clustering to accurately subtype patients using high-dimensional gene expression data. The framework was also extended to combine complementary information available in multi-omics data, by adapting techniques in network partitioning and cluster ensembles. The algorithm was validated on thousands of real cancer samples, using mRNA, methylation, and microRNA data available on Gene Expression Omnibus, the Broad Institute, and the Cancer Genome Atlas. This simultaneous subtyping approach accurately identifies known cancer subtypes and predicts the survival of novel subgroups of patients. We also developed a meta-analysis framework that combines two orthogonal types of data integration: horizontal and vertical meta-analysis. Integrative analyses of omics data often require all data types to be available for each individual patient. This reduces their practical availability since sample-matched data is relatively rare and difficult or expensive to obtain. We proposed an orthogonal meta-analysis framework that is able to overcome the sample-matched data bottleneck, by successfully integrating datasets of different types generated in independent laboratories from different sets of patients. The proposed framework was validated using 1,471 samples from 15 mRNA and 14 miRNA expression datasets related to two human cancers, colorectal cancer and pancreatic cancer. The orthogonal approach reliably identifies signaling pathways that are impacted by the two cancer diseases. While validated in the context of pathway analysis, the framework can be modified to adapt to other domains or applications
    corecore