Search CORE

31 research outputs found

Binary Jumbled String Matching for Highly Run-Length Compressible Texts

Author: Badkobeh Golnaz
Fici Gabriele
Kroon Steve
Lipták Zsuzsanna
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The Binary Jumbled String Matching problem is defined as: Given a string

s

over

\{a,b\}

of length

n

and a query

(x,y)

, with

x,y

non-negative integers, decide whether

s

has a substring

t

with exactly

x

a

's and

y

b

's. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time

O(n^2/\log n)

[Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or

O(n^2/\log^2 n)

in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of

s

. The construction time of our index is

O(n+\rho^2\log \rho)

, where O(n) is the time for computing the run-length encoding of

s

and

\rho

is the length of this encoding---this is no worse than previous solutions if

\rho = O(n/\log n)

and better if

\rho = o(n/\log n)

. Our index

L

can be queried in

O(\log \rho)

time. While

|L|= O(\min(n, \rho^{2}))

in the worst case, preliminary investigations have indicated that

|L|

may often be close to

\rho

. Furthermore, the algorithm for constructing the index is conceptually simple and easy to implement. In an attempt to shed light on the structure and size of our index, we characterize it in terms of the prefix normal forms of

s

introduced in [Fici and Lipt\'ak, DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures on size of Corner Index (we no longer conjecture it to be always linear in size of RLE); removed experimental part on random strings (these are valid but limited in their predictive power w.r.t. general strings); v3 published in IP

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio istituzionale della ricerca - Università di Palermo

Learning Dynamics of Linear Denoising Autoencoders

Author: Kamper Herman
Kroon Steve
Pretorius Arnu
Publication venue
Publication date: 01/01/2018
Field of study

Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still lacking of how the input noise influences learning. Here we develop theory for how noise influences learning in DAEs. By focusing on linear DAEs, we are able to derive analytic expressions that exactly describe their learning dynamics. We verify our theoretical predictions with simulations as well as experiments on MNIST and CIFAR-10. The theory illustrates how, when tuned correctly, noise allows DAEs to ignore low variance directions in the inputs while learning to reconstruct them. Furthermore, in a comparison of the learning dynamics of DAEs to standard regularised autoencoders, we show that noise has a similar regularisation effect to weight decay, but with faster training dynamics. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.Comment: 14 pages, 7 figures, accepted at the 35th International Conference on Machine Learning (ICML) 201

arXiv.org e-Print Archive

Stellenbosch University SUNScholar Repository

Stochastic Gradient Annealed Importance Sampling for Efficient Online Marginal Likelihood Estimation

Author: Cameron Scott A.
Eggers Hans C.
Kroon Steve
Publication venue: 'MDPI AG'
Publication date: 12/11/2019
Field of study

We consider estimating the marginal likelihood in settings with independent and identically distributed (i.i.d.) data. We propose estimating the predictive distributions in a sequential factorization of the marginal likelihood in such settings by using stochastic gradient Markov Chain Monte Carlo techniques. This approach is far more efficient than traditional marginal likelihood estimation techniques such as nested sampling and annealed importance sampling due to its use of mini-batches to approximate the likelihood. Stability of the estimates is provided by an adaptive annealing schedule. The resulting stochastic gradient annealed importance sampling (SGAIS) technique, which is the key contribution of our paper, enables us to estimate the marginal likelihood of a number of models considerably faster than traditional approaches, with no noticeable loss of accuracy. An important benefit of our approach is that the marginal likelihood is calculated in an online fashion as data becomes available, allowing the estimates to be used for applications such as online weighted model combination

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Stellenbosch University SUNScholar Repository