10 research outputs found
Identifiability and Unmixing of Latent Parse Trees
This paper explores unsupervised learning of parsing models along two
directions. First, which models are identifiable from infinite data? We use a
general technique for numerically checking identifiability based on the rank of
a Jacobian matrix, and apply it to several standard constituency and dependency
parsing models. Second, for identifiable models, how do we estimate the
parameters efficiently? EM suffers from local optima, while recent work using
spectral methods cannot be directly applied since the topology of the parse
tree varies across sentences. We develop a strategy, unmixing, which deals with
this additional complexity for restricted classes of parsing models
Simple Hardware-Efficient PCFGs with Independent Left and Right Productions
Scaling dense PCFGs to thousands of nonterminals via a low-rank
parameterization of the rule probability tensor has been shown to be beneficial
for unsupervised parsing. However, PCFGs scaled this way still perform poorly
as a language model, and even underperform similarly-sized HMMs. This work
introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and
right productions. Despite imposing a stronger independence assumption than the
low-rank approach, we find that this formalism scales more effectively both as
a language model and as an unsupervised parser. As an unsupervised parser, our
simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language
model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank
PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware
implementation of the inside algorithm for efficiently scaling simple PCFGs.Comment: Accepted to Findings of EMNLP, 202
Unsupervised spectral learning of WCFG as low-rank matrix completion
We derive a spectral method for unsupervised
learning ofWeighted Context Free Grammars.
We frame WCFG induction as finding a Hankel
matrix that has low rank and is linearly
constrained to represent a function computed
by inside-outside recursions. The proposed algorithm picks the grammar that agrees with a sample and is the simplest with respect to the nuclear norm of the Hankel matrix.Peer ReviewedPreprin
Recommended from our members
Contrastive Learning Using Spectral Methods
In many natural settings, the analysis goal is not to characterize a single data set in isolation, but rather to understand the difference between one set of observations and another. For example, given a background corpus of news articles together with writings of a particular author, one may want a topic model that explains word patterns and themes specific to the author. Another example comes from genomics, in which biological signals may be collected from different regions of a genome, and one wants a model that captures the differential statistics observed in these regions. This paper formalizes this notion of contrastive learning for mixture models, and develops spectral algorithms for inferring mixture components specific to a foreground data set when contrasted with a background data set. The method builds on recent moment-based estimators and tensor decompositions for latent variable models, and has the intuitive feature of using background data statistics to appropriately modify moments estimated from foreground data. A key advantage of the method is that the background data need only be coarsely modeled, which is important when the background is too complex, noisy, or not of interest. The method is demonstrated on applications in contrastive topic modeling and genomic sequence analysis.Engineering and Applied Science
Tensor decompositions for learning latent variable models
This work considers a computationally and statistically efficient parameter
estimation method for a wide class of latent variable models---including
Gaussian mixture models, hidden Markov models, and latent Dirichlet
allocation---which exploits a certain tensor structure in their low-order
observable moments (typically, of second- and third-order). Specifically,
parameter estimation is reduced to the problem of extracting a certain
(orthogonal) decomposition of a symmetric tensor derived from the moments; this
decomposition can be viewed as a natural generalization of the singular value
decomposition for matrices. Although tensor decompositions are generally
intractable to compute, the decomposition of these specially structured tensors
can be efficiently obtained by a variety of approaches, including power
iterations and maximization approaches (similar to the case of matrices). A
detailed analysis of a robust tensor power method is provided, establishing an
analogue of Wedin's perturbation theorem for the singular vectors of matrices.
This implies a robust and computationally tractable estimation approach for
several popular latent variable models
Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability
We give a robust version of the celebrated result of Kruskal on the
uniqueness of tensor decompositions: we prove that given a tensor whose
decomposition satisfies a robust form of Kruskal's rank condition, it is
possible to approximately recover the decomposition if the tensor is known up
to a sufficiently small (inverse polynomial) error.
Kruskal's theorem has found many applications in proving the identifiability
of parameters for various latent variable models and mixture models such as
Hidden Markov models, topic models etc. Our robust version immediately implies
identifiability using only polynomially many samples in many of these settings.
This polynomial identifiability is an essential first step towards efficient
learning algorithms for these models.
Recently, algorithms based on tensor decompositions have been used to
estimate the parameters of various hidden variable models efficiently in
special cases as long as they satisfy certain "non-degeneracy" properties. Our
methods give a way to go beyond this non-degeneracy barrier, and establish
polynomial identifiability of the parameters under much milder conditions.
Given the importance of Kruskal's theorem in the tensor literature, we expect
that this robust version will have several applications beyond the settings we
explore in this work.Comment: 51 pages, 2 figure
Statistical Analysis of Structured Latent Attribute Models
In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts' prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology. SLAMs bring exciting opportunities and unique challenges. In particular, with high-dimensional discrete latent attributes and structural constraints encoded by a structural matrix, one needs to balance the gain in the model's explanatory power and interpretability, against the difficulty of understanding and handling the complex model structure. This dissertation studies such a family of structured latent attribute models from theoretical, methodological, and computational perspectives. On the theoretical front, we present identifiability results that advance the theoretical knowledge of how the structural matrix influences the estimability of SLAMs. The new identifiability conditions guide real-world practices of designing diagnostic tests and also lay the foundation for drawing valid statistical conclusions. On the methodology side, we propose a statistically consistent penalized likelihood approach to selecting significant latent patterns in the population in high dimensions. Computationally, we develop scalable algorithms to simultaneously recover both the structural matrix and the dependence structure of the latent attributes in ultrahigh dimensional scenarios. These developments explore an exponentially large model space involving many discrete latent variables, and they address the estimation and computation challenges of high-dimensional SLAMs arising from large-scale scientific measurements. The application of the proposed methodology to the data from international educational assessments reveals meaningful knowledge structures of the student population.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155196/1/yuqigu_1.pd