1,584 research outputs found
Distance Oracles for Time-Dependent Networks
We present the first approximate distance oracle for sparse directed networks
with time-dependent arc-travel-times determined by continuous, piecewise
linear, positive functions possessing the FIFO property.
Our approach precomputes approximate distance summaries from
selected landmark vertices to all other vertices in the network. Our oracle
uses subquadratic space and time preprocessing, and provides two sublinear-time
query algorithms that deliver constant and approximate
shortest-travel-times, respectively, for arbitrary origin-destination pairs in
the network, for any constant . Our oracle is based only on
the sparsity of the network, along with two quite natural assumptions about
travel-time functions which allow the smooth transition towards asymmetric and
time-dependent distance metrics.Comment: A preliminary version appeared as Technical Report ECOMPASS-TR-025 of
EU funded research project eCOMPASS (http://www.ecompass-project.eu/). An
extended abstract also appeared in the 41st International Colloquium on
Automata, Languages, and Programming (ICALP 2014, track-A
Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms
Let be an unknown and arbitrary probability distribution over . We
consider the problem of {\em density estimation}, in which a learning algorithm
is given i.i.d. draws from and must (with high probability) output a
hypothesis distribution that is close to . The main contribution of this
paper is a highly efficient density estimation algorithm for learning using a
variable-width histogram, i.e., a hypothesis distribution with a piecewise
constant probability density function.
In more detail, for any and , we give an algorithm that makes
draws from , runs in
time, and outputs a hypothesis distribution that is piecewise constant with
pieces. With high probability the hypothesis
satisfies ,
where denotes the total variation distance (statistical
distance), is a universal constant, and is the smallest
total variation distance between and any -piecewise constant
distribution. The sample size and running time of our algorithm are optimal up
to logarithmic factors. The "approximation factor" in our result is
inherent in the problem, as we prove that no algorithm with sample size bounded
in terms of and can achieve regardless of what kind of
hypothesis distribution it uses.Comment: conference version appears in NIPS 201
New efficient algorithms for multiple change-point detection with kernels
Several statistical approaches based on reproducing kernels have been
proposed to detect abrupt changes arising in the full distribution of the
observations and not only in the mean or variance. Some of these approaches
enjoy good statistical properties (oracle inequality, \ldots). Nonetheless,
they have a high computational cost both in terms of time and memory. This
makes their application difficult even for small and medium sample sizes (). This computational issue is addressed by first describing a new
efficient and exact algorithm for kernel multiple change-point detection with
an improved worst-case complexity that is quadratic in time and linear in
space. It allows dealing with medium size signals (up to ).
Second, a faster but approximation algorithm is described. It is based on a
low-rank approximation to the Gram matrix. It is linear in time and space. This
approximation algorithm can be applied to large-scale signals ().
These exact and approximation algorithms have been implemented in \texttt{R}
and \texttt{C} for various kernels. The computational and statistical
performances of these new algorithms have been assessed through empirical
experiments. The runtime of the new algorithms is observed to be faster than
that of other considered procedures. Finally, simulations confirmed the higher
statistical accuracy of kernel-based approaches to detect changes that are not
only in the mean. These simulations also illustrate the flexibility of
kernel-based approaches to analyze complex biological profiles made of DNA copy
number and allele B frequencies. An R package implementing the approach will be
made available on github
Small space and streaming pattern matching with k edits
In this work, we revisit the fundamental and well-studied problem of
approximate pattern matching under edit distance. Given an integer , a
pattern of length , and a text of length , the task is to
find substrings of that are within edit distance from . Our main
result is a streaming algorithm that solves the problem in
space and amortised time per character of the text, providing
answers correct with high probability. (Hereafter, hides a
factor.) This answers a decade-old question: since the
discovery of a -space streaming algorithm for pattern
matching under Hamming distance by Porat and Porat [FOCS 2009], the existence
of an analogous result for edit distance remained open. Up to this work, no
-space algorithm was known even in the simpler
semi-streaming model, where comes as a stream but is available for
read-only access. In this model, we give a deterministic algorithm that
achieves slightly better complexity.
In order to develop the fully streaming algorithm, we introduce a new edit
distance sketch parametrised by integers . For any string of length at
most , the sketch is of size and it can be computed with an
-space streaming algorithm. Given the sketches of two strings,
in time we can compute their edit distance or certify that it
is larger than . This result improves upon -size sketches of
Belazzougui and Zhu [FOCS 2016] and very recent -size sketches
of Jin, Nelson, and Wu [STACS 2021]
- âŠ