5,643 research outputs found
Rethinking LDA: moment matching for discrete ICA
We consider moment matching techniques for estimation in Latent Dirichlet
Allocation (LDA). By drawing explicit links between LDA and discrete versions
of independent component analysis (ICA), we first derive a new set of
cumulant-based tensors, with an improved sample complexity. Moreover, we reuse
standard ICA techniques such as joint diagonalization of tensors to improve
over existing methods based on the tensor power method. In an extensive set of
experiments on both synthetic and real datasets, we show that our new
combination of tensors and orthogonal joint diagonalization techniques
outperforms existing moment matching methods.Comment: 30 pages; added plate diagrams and clarifications, changed style,
corrected typos, updated figures. in Proceedings of the 29-th Conference on
Neural Information Processing Systems (NIPS), 201
Fourth Moments and Independent Component Analysis
In independent component analysis it is assumed that the components of the
observed random vector are linear combinations of latent independent random
variables, and the aim is then to find an estimate for a transformation matrix
back to these independent components. In the engineering literature, there are
several traditional estimation procedures based on the use of fourth moments,
such as FOBI (fourth order blind identification), JADE (joint approximate
diagonalization of eigenmatrices), and FastICA, but the statistical properties
of these estimates are not well known. In this paper various independent
component functionals based on the fourth moments are discussed in detail,
starting with the corresponding optimization problems, deriving the estimating
equations and estimation algorithms, and finding asymptotic statistical
properties of the estimates. Comparisons of the asymptotic variances of the
estimates in wide independent component models show that in most cases JADE and
the symmetric version of FastICA perform better than their competitors.Comment: Published at http://dx.doi.org/10.1214/15-STS520 in the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
OV Graphs Are (Probably) Hard Instances
© Josh Alman and Virginia Vassilevska Williams. A graph G on n nodes is an Orthogonal Vectors (OV) graph of dimension d if there are vectors v1, . . ., vn ∈ {0, 1}d such that nodes i and j are adjacent in G if and only if hvi, vji = 0 over Z. In this paper, we study a number of basic graph algorithm problems, except where one is given as input the vectors defining an OV graph instead of a general graph. We show that for each of the following problems, an algorithm solving it faster on such OV graphs G of dimension only d = O(log n) than in the general case would refute a plausible conjecture about the time required to solve sparse MAX-k-SAT instances: Determining whether G contains a triangle. More generally, determining whether G contains a directed k-cycle for any k ≥ 3. Computing the square of the adjacency matrix of G over Z or F2. Maintaining the shortest distance between two fixed nodes of G, or whether G has a perfect matching, when G is a dynamically updating OV graph. We also prove some complementary results about OV graphs. We show that any problem which is NP-hard on constant-degree graphs is also NP-hard on OV graphs of dimension O(log n), and we give two problems which can be solved faster on OV graphs than in general: Maximum Clique, and Online Matrix-Vector Multiplication
- …