1,124 research outputs found
Cell-Probe Bounds for Online Edit Distance and Other Pattern Matching Problems
We give cell-probe bounds for the computation of edit distance, Hamming
distance, convolution and longest common subsequence in a stream. In this
model, a fixed string of symbols is given and one -bit symbol
arrives at a time in a stream. After each symbol arrives, the distance between
the fixed string and a suffix of most recent symbols of the stream is reported.
The cell-probe model is perhaps the strongest model of computation for showing
data structure lower bounds, subsuming in particular the popular word-RAM
model.
* We first give an lower bound for
the time to give each output for both online Hamming distance and convolution,
where is the word size. This bound relies on a new encoding scheme and for
the first time holds even when is as small as a single bit.
* We then consider the online edit distance and longest common subsequence
problems in the bit-probe model () with a constant sized input alphabet.
We give a lower bound of which
applies for both problems. This second set of results relies both on our new
encoding scheme as well as a carefully constructed hard distribution.
* Finally, for the online edit distance problem we show that there is an
upper bound in the cell-probe model. This bound gives a
contrast to our new lower bound and also establishes an exponential gap between
the known cell-probe and RAM model complexities.Comment: 32 pages, 4 figure
Joint Structure Learning of Multiple Non-Exchangeable Networks
Several methods have recently been developed for joint structure learning of
multiple (related) graphical models or networks. These methods treat individual
networks as exchangeable, such that each pair of networks are equally
encouraged to have similar structures. However, in many practical applications,
exchangeability in this sense may not hold, as some pairs of networks may be
more closely related than others, for example due to group and sub-group
structure in the data. Here we present a novel Bayesian formulation that
generalises joint structure learning beyond the exchangeable case. In addition
to a general framework for joint learning, we (i) provide a novel default prior
over the joint structure space that requires no user input; (ii) allow for
latent networks; (iii) give an efficient, exact algorithm for the case of time
series data and dynamic Bayesian networks. We present empirical results on
non-exchangeable populations, including a real data example from biology, where
cell-line-specific networks are related according to genomic features.Comment: To appear in Proceedings of the Seventeenth International Conference
on Artificial Intelligence and Statistics (AISTATS
Joint estimation of multiple related biological networks
Graphical models are widely used to make inferences concerning interplay in
multivariate systems. In many applications, data are collected from multiple
related but nonidentical units whose underlying networks may differ but are
likely to share features. Here we present a hierarchical Bayesian formulation
for joint estimation of multiple networks in this nonidentically distributed
setting. The approach is general: given a suitable class of graphical models,
it uses an exchangeability assumption on networks to provide a corresponding
joint formulation. Motivated by emerging experimental designs in molecular
biology, we focus on time-course data with interventions, using dynamic
Bayesian networks as the graphical models. We introduce a computationally
efficient, deterministic algorithm for exact joint inference in this setting.
We provide an upper bound on the gains that joint estimation offers relative to
separate estimation for each network and empirical results that support and
extend the theory, including an extensive simulation study and an application
to proteomic data from human cancer cell lines. Finally, we describe
approximations that are still more computationally efficient than the exact
algorithm and that also demonstrate good empirical performance.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS761 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Driving Performance Associated with the Morning Commute Improves Over a Week of Simulated Night Shifts
This item is only available electronically.Commuters driving home from a night shift are at greater risk of having a motor vehicle accident due to extended wake episodes, sleep loss and circadian misalignment. Over consecutive night shifts, driving performance may improve as the circadian system adapts to the sleep-wake schedule, or decline with the accumulation of sleep loss. The aim of this study was to investigate driving performance associated with the post night shift commute over seven consecutive night shifts. Sixty-seven subjects undertook seven simulated night shifts under laboratory conditions. Following each shift, participants performed a 20-minute simulated driving task. Driving performance was assessed using lane variability (i.e. standard deviation of lateral position), speed variability (i.e. standard deviation of speed), and the likelihood of crashing and speeding relative to a daytime drive. Lane variability, speed variability and the likelihood of crashing declined over seven consecutive night shifts. The likelihood of speeding exhibited no change. These findings indicate that driving performance improved over the seven consecutive night shifts. The trend in performance likely reflected the adaptation of the circadian system. These results indicate that relatively short sequences of night shifts that dominate most Occupational Health and Safety guidelines may not always be optimal in minimizing fatigue-related risk.Thesis (B.PsychSc(Hons)) -- University of Adelaide, School of Psychology, 201
The k-mismatch problem revisited
We revisit the complexity of one of the most basic problems in pattern
matching. In the k-mismatch problem we must compute the Hamming distance
between a pattern of length m and every m-length substring of a text of length
n, as long as that Hamming distance is at most k. Where the Hamming distance is
greater than k at some alignment of the pattern and text, we simply output
"No".
We study this problem in both the standard offline setting and also as a
streaming problem. In the streaming k-mismatch problem the text arrives one
symbol at a time and we must give an output before processing any future
symbols. Our main results are as follows:
1) Our first result is a deterministic time offline algorithm for k-mismatch on a text of length n. This is a
factor of k improvement over the fastest previous result of this form from SODA
2000 by Amihood Amir et al.
2) We then give a randomised and online algorithm which runs in the same time
complexity but requires only space in total.
3) Next we give a randomised -approximation algorithm for the
streaming k-mismatch problem which uses
space and runs in worst-case time per
arriving symbol.
4) Finally we combine our new results to derive a randomised
space algorithm for the streaming k-mismatch problem
which runs in worst-case time per
arriving symbol. This improves the best previous space complexity for streaming
k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We
also improve the time complexity of this previous result by an even greater
factor to match the fastest known offline algorithm (up to logarithmic
factors)
- …