22 research outputs found
Learning Graphical Models Using Multiplicative Weights
We give a simple, multiplicative-weight update algorithm for learning
undirected graphical models or Markov random fields (MRFs). The approach is
new, and for the well-studied case of Ising models or Boltzmann machines, we
obtain an algorithm that uses a nearly optimal number of samples and has
quadratic running time (up to logarithmic factors), subsuming and improving on
all prior work. Additionally, we give the first efficient algorithm for
learning Ising models over general alphabets.
Our main application is an algorithm for learning the structure of t-wise
MRFs with nearly-optimal sample complexity (up to polynomial losses in
necessary terms that depend on the weights) and running time that is
. In addition, given samples, we can also learn the
parameters of the model and generate a hypothesis that is close in statistical
distance to the true MRF. All prior work runs in time for
graphs of bounded degree d and does not generate a hypothesis close in
statistical distance even for t=3. We observe that our runtime has the correct
dependence on n and t assuming the hardness of learning sparse parities with
noise.
Our algorithm--the Sparsitron-- is easy to implement (has only one parameter)
and holds in the on-line setting. Its analysis applies a regret bound from
Freund and Schapire's classic Hedge algorithm. It also gives the first solution
to the problem of learning sparse Generalized Linear Models (GLMs)
Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors
We show tight lower bounds for the entire trade-off between space and query
time for the Approximate Near Neighbor search problem. Our lower bounds hold in
a restricted model of computation, which captures all hashing-based approaches.
In articular, our lower bound matches the upper bound recently shown in
[Laarhoven 2015] for the random instance on a Euclidean sphere (which we show
in fact extends to the entire space using the techniques from
[Andoni, Razenshteyn 2015]).
We also show tight, unconditional cell-probe lower bounds for one and two
probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder
2010]. In particular, this is the first space lower bound (for any static data
structure) for two probes which is not polynomially smaller than for one probe.
To show the result for two probes, we establish and exploit a connection to
locally-decodable codes.Comment: 47 pages, 2 figures; v2: substantially revised introduction, lots of
small corrections; subsumed by arXiv:1608.03580 [cs.DS] (along with
arXiv:1511.07527 [cs.DS]
Explicit Correlation Amplifiers for Finding Outlier Correlations in Deterministic Subquadratic Time
We derandomize G. Valiant\u27s [J.ACM 62(2015) Art.13] subquadratic-time algorithm for finding outlier correlations in binary data. Our derandomized algorithm gives deterministic subquadratic scaling essentially for the same parameter range as Valiant\u27s randomized algorithm, but the precise constants we save over quadratic scaling are more modest. Our main technical tool for derandomization is an explicit family of correlation amplifiers built via a family of zigzag-product expanders in Reingold, Vadhan, and Wigderson [Ann. of Math 155(2002), 157-187]. We say that a function f:{-1,1}^d ->{-1,1}^D is a correlation amplifier with threshold 0 = 1, and strength p an even positive integer if for all pairs of vectors x,y in {-1,1}^d it holds that (i) ||| | >= tau*d implies (/gamma^d})^p*D /d)^p*D
Fast Heavy Inner Product Identification Between Weights and Inputs in Neural Network Training
In this paper, we consider a heavy inner product identification problem,
which generalizes the Light Bulb problem~(\cite{prr89}): Given two sets and with , if there
are exact pairs whose inner product passes a certain threshold, i.e.,
such that , for a threshold , the goal is to identify those heavy inner products. We provide an
algorithm that runs in time to find the inner
product pairs that surpass threshold with high probability,
where is the current matrix multiplication exponent. By solving this
problem, our method speed up the training of neural networks with ReLU
activation function.Comment: IEEE BigData 202
Fast Sketch-based Recovery of Correlation Outliers
Many data sources can be interpreted as time-series, and a key problem is to identify which pairs out of a large collection of signals are highly correlated. We expect that there will be few, large, interesting correlations, while most signal pairs do not have any strong correlation. We abstract this as the problem of identifying the highly correlated pairs in a collection of n mostly pairwise uncorrelated random variables, where observations of the variables arrives as a stream. Dimensionality reduction can remove dependence on the number of observations, but further techniques are required to tame the quadratic (in n) cost of a search through all possible pairs.
We develop a new algorithm for rapidly finding large correlations based on sketch techniques with an added twist: we quickly generate sketches of random combinations of signals, and use these in concert with ideas from coding theory to decode the identity of correlated pairs. We prove correctness and compare performance and effectiveness with the best LSH (locality sensitive hashing) based approach
A Faster Algorithm for Finding Closest Pairs in Hamming Metric
We study the Closest Pair Problem in Hamming metric, which asks to find the pair with the smallest Hamming distance in a collection of binary vectors. We give a new randomized algorithm for the problem on uniformly random input outperforming previous approaches whenever the dimension of input points is small compared to the dataset size. For moderate to large dimensions, our algorithm matches the time complexity of the previously best-known locality sensitive hashing based algorithms. Technically our algorithm follows similar design principles as Dubiner (IEEE Trans. Inf. Theory 2010) and May-Ozerov (Eurocrypt 2015). Besides improving the time complexity in the aforementioned areas, we significantly simplify the analysis of these previous works. We give a modular analysis, which allows us to investigate the performance of the algorithm also on non-uniform input distributions. Furthermore, we give a proof of concept implementation of our algorithm which performs well in comparison to a quadratic search baseline. This is the first step towards answering an open question raised by May and Ozerov regarding the practicability of algorithms following these design principles