10 research outputs found

    Probabilistic boolean tensor decomposition

    Get PDF

    Algorithmic Results for Clustering and Refined Physarum Analysis

    Get PDF
    In the first part of this thesis, we study the Binary ℓ0\ell_0-Rank-kk problem which given a binary matrix AA and a positive integer kk, seeks to find a rank-kk binary matrix BB minimizing the number of non-zero entries of A−BA-B. A central open question is whether this problem admits a polynomial time approximation scheme. We give an affirmative answer to this question by designing the first randomized almost-linear time approximation scheme for constant kk over the reals, F2\mathbb{F}_2, and the Boolean semiring. In addition, we give novel algorithms for important variants of ℓ0\ell_0-low rank approximation. The second part of this dissertation, studies a popular and successful heuristic, known as Approximate Spectral Clustering (ASC), for partitioning the nodes of a graph GG into clusters with small conductance. We give a comprehensive analysis, showing that ASC runs efficiently and yields a good approximation of an optimal kk-way node partition of GG. In the final part of this thesis, we present two results on slime mold computations: i) the continuous undirected Physarum dynamics converges for undirected linear programs with a non-negative cost vector; and ii) for the discrete directed Physarum dynamics, we give a refined analysis that yields strengthened and close to optimal convergence rate bounds, and shows that the model can be initialized with any strongly dominating point.Im ersten Teil dieser Arbeit untersuchen wir das Binary ℓ0\ell_0-Rank-kk Problem. Hier sind eine bin{\"a}re Matrix AA und eine positive ganze Zahl kk gegeben und gesucht wird eine bin{\"a}re Matrix BB mit Rang kk, welche die Anzahl von nicht null Eintr{\"a}gen in A−BA-B minimiert. Wir stellen das erste randomisierte, nahezu lineare Aproximationsschema vor konstantes kk {\"u}ber die reellen Zahlen, F2\mathbb{F}_2 und den Booleschen Semiring. Zus{\"a}tzlich erzielen wir neue Algorithmen f{\"u}r wichtige Varianten der ℓ0\ell_0-low rank Approximation. Der zweite Teil dieser Dissertation besch{\"a}ftigt sich mit einer beliebten und erfolgreichen Heuristik, die unter dem Namen Approximate Spectral Cluster (ASC) bekannt ist. ASC partitioniert die Knoten eines gegeben Graphen GG in Cluster kleiner Conductance. Wir geben eine umfassende Analyse von ASC, die zeigt, dass ASC eine effiziente Laufzeit besitzt und eine gute Approximation einer optimale kk-Weg-Knoten Partition f{\"u}r GG berechnet. Im letzten Teil dieser Dissertation pr{\"a}sentieren wir zwei Ergebnisse {\"u}ber Berechnungen mit Hilfe von Schleimpilzen: i) die kontinuierliche ungerichtete Physarum Dynamik konvergiert f{\"u}r ungerichtete lineare Programme mit einem nicht negativen Kostenvektor; und ii) f{\"u}r die diskrete gerichtete Physikum Dynamik geben wir eine verfeinerte Analyse, die st{\"a}rkere und beinahe optimale Schranken f{\"u}r ihre Konvergenzraten liefert und zeigt, dass das Model mit einem beliebigen stark dominierender Punkt initialisiert werden kann

    Statistical Analysis of Structured Latent Attribute Models

    Full text link
    In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts' prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology. SLAMs bring exciting opportunities and unique challenges. In particular, with high-dimensional discrete latent attributes and structural constraints encoded by a structural matrix, one needs to balance the gain in the model's explanatory power and interpretability, against the difficulty of understanding and handling the complex model structure. This dissertation studies such a family of structured latent attribute models from theoretical, methodological, and computational perspectives. On the theoretical front, we present identifiability results that advance the theoretical knowledge of how the structural matrix influences the estimability of SLAMs. The new identifiability conditions guide real-world practices of designing diagnostic tests and also lay the foundation for drawing valid statistical conclusions. On the methodology side, we propose a statistically consistent penalized likelihood approach to selecting significant latent patterns in the population in high dimensions. Computationally, we develop scalable algorithms to simultaneously recover both the structural matrix and the dependence structure of the latent attributes in ultrahigh dimensional scenarios. These developments explore an exponentially large model space involving many discrete latent variables, and they address the estimation and computation challenges of high-dimensional SLAMs arising from large-scale scientific measurements. The application of the proposed methodology to the data from international educational assessments reveals meaningful knowledge structures of the student population.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155196/1/yuqigu_1.pd
    corecore