26 research outputs found

    PAMOGK: A pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups

    Get PDF
    Accurate classification of patients into homogeneous molecular subgroups is critical for the developmentof effective therapeutics and for deciphering what drives these different subtypes to cancer. However, the extensivemolecular heterogeneity observed among cancer patients presents a challenge. The availability of multi-omic datacatalogs for large cohorts of cancer patients provides multiple views into the molecular biology of the tumorswith unprecedented resolution. In this work, we develop PAMOGK, which integrates multi-omics patient data andincorporates the existing knowledge on biological pathways. PAMOGK is well suited to deal with the sparsity ofalterations in assessing patient similarities. We develop a novel graph kernel which we denote as smoothed shortestpath graph kernel, which evaluates patient similarities based on a single molecular alteration type in the contextof pathway. To corroborate multiple views of patients evaluated by hundreds of pathways and molecular alterationcombinations, PAMOGK uses multi-view kernel clustering. We apply PAMOGK to find subgroups of kidney renalclear cell carcinoma (KIRC) patients, which results in four clusters with significantly different survival times (p-value =7.4e-10). The patient subgroups also differ with respect to other clinical parameters such as tumor stage andgrade, and primary tumor and metastasis tumor spreads. When we compare PAMOGK to 8 other state-of-the-artexisting multi-omics clustering methods, PAMOGK consistently outperforms these in terms of its ability to partitionpatients into groups with different survival distributions. PAMOGK enables extracting the relative importance ofpathways and molecular data types. PAMOGK is available at github.com/tastanlab/pamog

    GLANET: genomic loci annotation and enrichment tool

    No full text
    Motivation: Genomic studies identify genomic loci representing genetic variations, transcription factor (TF) occupancy, or histone modification through next generation sequencing (NGS) technologies. Interpreting these loci requires evaluating them with known genomic and epigenomic annotations

    Refining literature curated protein interactions using expert opinions

    No full text
    The availability of high-quality physical interaction datasets is a prerequisite for system-level analysis of interactomes and supervised models to predict protein-protein interactions (PPIs). One source is literature-curated PPI databases in which pairwise associations of proteins published in the scientific literature are deposited. However, PPIs may not be clearly labelled as physical interactions affecting the quality of the entire dataset. In order to obtain a high-quality gold standard dataset for PPIs between human immunodeficiency virus (HIV-1) and its human host, we adopted a crowd-sourcing approach. We collected expert opinions and utilized an expectation-maximization based approach to estimate expert labeling quality. These estimates are used to infer the probability of a reported PPI actually being a direct physical interaction given the set of expert opinions. The effectiveness of our approach is demonstrated through synthetic data experiments and a high quality physical interaction network between HIV and human proteins is obtained. Since many literature-curated databases suffer from similar challenges, the framework described herein could be utilized in refining other databases. The curated data is available at http://www.cs.bilkent.edu.tr/~oznur.tastan/supp/psb2015/
    corecore