9 research outputs found

    Distributed Learning, Prediction and Detection in Probabilistic Graphs.

    Full text link
    Critical to high-dimensional statistical estimation is to exploit the structure in the data distribution. Probabilistic graphical models provide an efficient framework for representing complex joint distributions of random variables through their conditional dependency graph, and can be adapted to many high-dimensional machine learning applications. This dissertation develops the probabilistic graphical modeling technique for three statistical estimation problems arising in real-world applications: distributed and parallel learning in networks, missing-value prediction in recommender systems, and emerging topic detection in text corpora. The common theme behind all proposed methods is a combination of parsimonious representation of uncertainties in the data, optimization surrogate that leads to computationally efficient algorithms, and fundamental limits of estimation performance in high dimension. More specifically, the dissertation makes the following theoretical contributions: (1) We propose a distributed and parallel framework for learning the parameters in Gaussian graphical models that is free of iterative global message passing. The proposed distributed estimator is shown to be asymptotically consistent, improve with increasing local neighborhood sizes, and have a high-dimensional error rate comparable to that of the centralized maximum likelihood estimator. (2) We present a family of latent variable Gaussian graphical models whose marginal precision matrix has a ā€œlow-rank plus sparseā€ structure. Under mild conditions, we analyze the high-dimensional parameter error bounds for learning this family of models using regularized maximum likelihood estimation. (3) We consider a hypothesis testing framework for detecting emerging topics in topic models, and propose a novel surrogate test statistic for the standard likelihood ratio. By leveraging the theory of empirical processes, we prove asymptotic consistency for the proposed test and provide guarantees of the detection performance.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/110499/1/mengzs_1.pd

    Distributed learning of Gaussian graphical models via marginal likelihoods

    Get PDF
    We consider distributed estimation of the inverse covariance matrix, also called the concentration matrix, in Gaussian graphical models. Traditional centralized estimation often requires iterative and expensive global inference and is therefore difficult in large distributed networks. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. Each node independently computes a local estimate by maximizing a marginal likelihood defined with respect to data collected from its local neighborhood. Due to the non-convexity of the MML problem, we derive and consider solving a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. We prove that this relaxed MML estimator is asymptotically consistent. Through numerical experiments on several synthetic and real-world data sets, we demonstrate that the two-hop version of the proposed estimator is significantly better than the one-hop version, and nearly closes the gap to the centralized maximum likelihood estimator in many situations.

    DISTRIBUTED PRINCIPAL COMPONENT ANALYSIS ON NETWORKS VIA DIRECTED GRAPHICAL MODELS

    No full text
    We introduce an efficient algorithm for performing distributed principal component analysis (PCA) on directed Gaussian graphical models. By exploiting structured sparsity in the Cholesky factor of the inverse covariance (concentration) matrix, our proposed DDPCA algorithm computes global principal subspace estimation through local computation and message passing. We show significant performance and computation/communication advantages of DDPCA for online principal subspace estimation and distributed anomaly detection in real-world computer networks. Index Terms ā€” Graphical models, principal component analysis, anomaly detection, distributed PCA, subspace tracking. 1

    An interleukin-17ā€“mediated paracrine network promotes tumor resistance to anti-angiogenic therapy

    No full text
    Although angiogenesis inhibitors have provided substantial clinical benefit as cancer therapeutics, their use is limited by resistance to their therapeutic effects. While ample evidence indicates that such resistance can be influenced by the tumor microenvironment, the underlying mechanisms remain incompletely understood. Here, we have uncovered a paracrine signaling network between the adaptive and innate immune systems that is associated with resistance in multiple tumor models: lymphoma, lung and colon. Tumor-infiltrating T helper type 17 (T(H)17) cells and interleukin-17 (IL-17) induced the expression of granulocyte colony-stimulating factor (G-CSF) through nuclear factor ĪŗB (NF-ĪŗB) and extracellular-related kinase (ERK) signaling, leading to immature myeloid-cell mobilization and recruitment into the tumor microenvironment. The occurrence of T(H)17 cells and Bv8-positive granulocytes was also observed in clinical tumor specimens. Tumors resistant to treatment with antibodies to VEGF were rendered sensitive in IL-17 receptor (IL-17R)-knockout hosts deficient in T(H)17 effector function. Furthermore, pharmacological blockade of T(H)17 cell function sensitized resistant tumors to therapy with antibodies to VEGF. These findings indicate that IL-17 promotes tumor resistance to VEGF inhibition, suggesting that immunomodulatory strategies could improve the efficacy of anti-angiogenic therapy

    Comprehensive molecular characterization of human colon and rectal cancer

    No full text
    To characterize somatic alterations in colorectal carcinoma, we conducted a genome-scale analysis of 276 samples, analysing exome sequence, DNA copy number, promoter methylation and messenger RNA and microRNA expression. A subset of these samples (97) underwent low-depth-of-coverage whole-genome sequencing. In total, 16% of colorectal carcinomas were found to be hypermutated: three-quarters of these had the expected high microsatellite instability, usually with hypermethylation and MLH1 silencing, and one-quarter had somatic mismatch-repair gene and polymerase Īµ (POLE) mutations. Excluding the hypermutated cancers, colon and rectum cancers were found to have considerably similar patterns of genomic alteration. Twenty-four genes were significantly mutated, and in addition to the expected APC, TP53, SMAD4, PIK3CA and KRAS mutations, we found frequent mutations in ARID1A, SOX9 and FAM123B. Recurrent copy-number alterations include potentially drug-targetable amplifications of ERBB2 and newly discovered amplification of IGF2. Recurrent chromosomal translocations include the fusion of NAV2 and WNT pathway member TCF7L1. Integrative analyses suggest new markers for aggressive colorectal carcinoma and an important role for MYC-directed transcriptional activation and repression.National Institutes of Health (U.S.) (Grant U24CA143799)National Institutes of Health (U.S.) (Grant U24CA143835)National Institutes of Health (U.S.) (Grant U24CA143840)National Institutes of Health (U.S.) (Grant U24CA143843)National Institutes of Health (U.S.) (Grant U24CA143845)National Institutes of Health (U.S.) (Grant U24CA143848)National Institutes of Health (U.S.) (Grant U24CA143858)National Institutes of Health (U.S.) (Grant U24CA143866)National Institutes of Health (U.S.) (Grant U24CA143867)National Institutes of Health (U.S.) (Grant U24CA143882)National Institutes of Health (U.S.) (Grant U24CA143883)National Institutes of Health (U.S.) (Grant U24CA144025)National Institutes of Health (U.S.) (Grant U54HG003067)National Institutes of Health (U.S.) (Grant U54HG003079)National Institutes of Health (U.S.) (Grant U54HG003273
    corecore