31 research outputs found
Binary Jumbled String Matching for Highly Run-Length Compressible Texts
The Binary Jumbled String Matching problem is defined as: Given a string
over of length and a query , with non-negative
integers, decide whether has a substring with exactly 's and
's. Previous solutions created an index of size O(n) in a pre-processing
step, which was then used to answer queries in constant time. The fastest
algorithms for construction of this index have running time
[Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or in
the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index
constructed directly from the run-length encoding of . The construction time
of our index is , where O(n) is the time for computing
the run-length encoding of and is the length of this encoding---this
is no worse than previous solutions if and better if . Our index can be queried in time. While
in the worst case, preliminary investigations have
indicated that may often be close to . Furthermore, the algorithm
for constructing the index is conceptually simple and easy to implement. In an
attempt to shed light on the structure and size of our index, we characterize
it in terms of the prefix normal forms of introduced in [Fici and Lipt\'ak,
DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures
on size of Corner Index (we no longer conjecture it to be always linear in
size of RLE); removed experimental part on random strings (these are valid
but limited in their predictive power w.r.t. general strings); v3 published
in IP
Learning Dynamics of Linear Denoising Autoencoders
Denoising autoencoders (DAEs) have proven useful for unsupervised
representation learning, but a thorough theoretical understanding is still
lacking of how the input noise influences learning. Here we develop theory for
how noise influences learning in DAEs. By focusing on linear DAEs, we are able
to derive analytic expressions that exactly describe their learning dynamics.
We verify our theoretical predictions with simulations as well as experiments
on MNIST and CIFAR-10. The theory illustrates how, when tuned correctly, noise
allows DAEs to ignore low variance directions in the inputs while learning to
reconstruct them. Furthermore, in a comparison of the learning dynamics of DAEs
to standard regularised autoencoders, we show that noise has a similar
regularisation effect to weight decay, but with faster training dynamics. We
also show that our theoretical predictions approximate learning dynamics on
real-world data and qualitatively match observed dynamics in nonlinear DAEs.Comment: 14 pages, 7 figures, accepted at the 35th International Conference on
Machine Learning (ICML) 201
Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation
We consider estimating the marginal likelihood in settings with independent
and identically distributed (i.i.d.) data. We propose estimating the predictive
distributions in a sequential factorization of the marginal likelihood in such
settings by using stochastic gradient Markov Chain Monte Carlo techniques. This
approach is far more efficient than traditional marginal likelihood estimation
techniques such as nested sampling and annealed importance sampling due to its
use of mini-batches to approximate the likelihood. Stability of the estimates
is provided by an adaptive annealing schedule. The resulting stochastic
gradient annealed importance sampling (SGAIS) technique, which is the key
contribution of our paper, enables us to estimate the marginal likelihood of a
number of models considerably faster than traditional approaches, with no
noticeable loss of accuracy. An important benefit of our approach is that the
marginal likelihood is calculated in an online fashion as data becomes
available, allowing the estimates to be used for applications such as online
weighted model combination