Search CORE

1,124 research outputs found

Cell-Probe Bounds for Online Edit Distance and Other Pattern Matching Problems

Author: Clifford Raphael
Jalsenius Markus
Sach Benjamin
Publication venue
Publication date: 24/07/2014
Field of study

We give cell-probe bounds for the computation of edit distance, Hamming distance, convolution and longest common subsequence in a stream. In this model, a fixed string of

n

symbols is given and one

\delta

-bit symbol arrives at a time in a stream. After each symbol arrives, the distance between the fixed string and a suffix of most recent symbols of the stream is reported. The cell-probe model is perhaps the strongest model of computation for showing data structure lower bounds, subsuming in particular the popular word-RAM model. * We first give an

\Omega((\delta \log n)/(w+\log\log n))

lower bound for the time to give each output for both online Hamming distance and convolution, where

w

is the word size. This bound relies on a new encoding scheme and for the first time holds even when

w

is as small as a single bit. * We then consider the online edit distance and longest common subsequence problems in the bit-probe model (

w=1

) with a constant sized input alphabet. We give a lower bound of

\Omega(\sqrt{\log n}/(\log\log n)^{3/2})

which applies for both problems. This second set of results relies both on our new encoding scheme as well as a carefully constructed hard distribution. * Finally, for the online edit distance problem we show that there is an

O((\log n)^2/w)

upper bound in the cell-probe model. This bound gives a contrast to our new lower bound and also establishes an exponential gap between the known cell-probe and RAM model complexities.Comment: 32 pages, 4 figure

arXiv.org e-Print Archive

Explore Bristol Research

Joint Structure Learning of Multiple Non-Exchangeable Networks

Author: Mukherjee Sach
Oates Chris J.
Publication venue
Publication date: 01/01/2014
Field of study

Several methods have recently been developed for joint structure learning of multiple (related) graphical models or networks. These methods treat individual networks as exchangeable, such that each pair of networks are equally encouraged to have similar structures. However, in many practical applications, exchangeability in this sense may not hold, as some pairs of networks may be more closely related than others, for example due to group and sub-group structure in the data. Here we present a novel Bayesian formulation that generalises joint structure learning beyond the exchangeable case. In addition to a general framework for joint learning, we (i) provide a novel default prior over the joint structure space that requires no user input; (ii) allow for latent networks; (iii) give an efficient, exact algorithm for the case of time series data and dynamic Bayesian networks. We present empirical results on non-exchangeable populations, including a real data example from biology, where cell-line-specific networks are related according to genomic features.Comment: To appear in Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS

arXiv.org e-Print Archive

CiteSeerX

OPUS - University of Technology Sydney

Joint estimation of multiple related biological networks

Author: Gray Joe W.
Korkola Jim
Mukherjee Sach
Oates Chris J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2014
Field of study

Graphical models are widely used to make inferences concerning interplay in multivariate systems. In many applications, data are collected from multiple related but nonidentical units whose underlying networks may differ but are likely to share features. Here we present a hierarchical Bayesian formulation for joint estimation of multiple networks in this nonidentically distributed setting. The approach is general: given a suitable class of graphical models, it uses an exchangeability assumption on networks to provide a corresponding joint formulation. Motivated by emerging experimental designs in molecular biology, we focus on time-course data with interventions, using dynamic Bayesian networks as the graphical models. We introduce a computationally efficient, deterministic algorithm for exact joint inference in this setting. We provide an upper bound on the gains that joint estimation offers relative to separate estimation for each network and empirical results that support and extend the theory, including an extensive simulation study and an application to proteomic data from human cancer cell lines. Finally, we describe approximations that are still more computationally efficient than the exact algorithm and that also demonstrate good empirical performance.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS761 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Driving Performance Associated with the Morning Commute Improves Over a Week of Simulated Night Shifts

Author: Sach Edward
Publication venue
Publication date: 01/01/2019
Field of study

This item is only available electronically.Commuters driving home from a night shift are at greater risk of having a motor vehicle accident due to extended wake episodes, sleep loss and circadian misalignment. Over consecutive night shifts, driving performance may improve as the circadian system adapts to the sleep-wake schedule, or decline with the accumulation of sleep loss. The aim of this study was to investigate driving performance associated with the post night shift commute over seven consecutive night shifts. Sixty-seven subjects undertook seven simulated night shifts under laboratory conditions. Following each shift, participants performed a 20-minute simulated driving task. Driving performance was assessed using lane variability (i.e. standard deviation of lateral position), speed variability (i.e. standard deviation of speed), and the likelihood of crashing and speeding relative to a daytime drive. Lane variability, speed variability and the likelihood of crashing declined over seven consecutive night shifts. The likelihood of speeding exhibited no change. These findings indicate that driving performance improved over the seven consecutive night shifts. The trend in performance likely reflected the adaptation of the circadian system. These results indicate that relatively short sequences of night shifts that dominate most Occupational Health and Safety guidelines may not always be optimal in minimizing fatigue-related risk.Thesis (B.PsychSc(Hons)) -- University of Adelaide, School of Psychology, 201

Adelaide Research & Scholarship

The k-mismatch problem revisited

Author: Clifford Raphaël
Fontaine Allyx
Porat Ely
Sach Benjamin
Starikovskaya Tatiana
Publication venue
Publication date: 27/08/2015
Field of study

We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No". We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows: 1) Our first result is a deterministic

O(n k^2\log{k} / m+n \text{polylog} m)

time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood Amir et al. 2) We then give a randomised and online algorithm which runs in the same time complexity but requires only

O(k^2\text{polylog} {m})

space in total. 3) Next we give a randomised

(1+\epsilon)

-approximation algorithm for the streaming k-mismatch problem which uses

O(k^2\text{polylog} m / \epsilon^2)

space and runs in

O(\text{polylog} m / \epsilon^2)

worst-case time per arriving symbol. 4) Finally we combine our new results to derive a randomised

O(k^2\text{polylog} {m})

space algorithm for the streaming k-mismatch problem which runs in

O(\sqrt{k}\log{k} + \text{polylog} {m})

worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors)

arXiv.org e-Print Archive

Crossref

Explore Bristol Research