Search CORE

1,584 research outputs found

Distance Oracles for Time-Dependent Networks

Author: A. Orda
B.C. Dean
D. Delling
E. Porat
F. Dehne
G. Nannicini
H.D. Sherali
K. Cooke
M. Thorup
S.E. Dreyfus
Publication venue
Publication date: 01/01/2014
Field of study

We present the first approximate distance oracle for sparse directed networks with time-dependent arc-travel-times determined by continuous, piecewise linear, positive functions possessing the FIFO property. Our approach precomputes

(1+\epsilon)-

approximate distance summaries from selected landmark vertices to all other vertices in the network. Our oracle uses subquadratic space and time preprocessing, and provides two sublinear-time query algorithms that deliver constant and

(1+\sigma)-

approximate shortest-travel-times, respectively, for arbitrary origin-destination pairs in the network, for any constant

\sigma > \epsilon

. Our oracle is based only on the sparsity of the network, along with two quite natural assumptions about travel-time functions which allow the smooth transition towards asymmetric and time-dependent distance metrics.Comment: A preliminary version appeared as Technical Report ECOMPASS-TR-025 of EU funded research project eCOMPASS (http://www.ecompass-project.eu/). An extended abstract also appeared in the 41st International Colloquium on Automata, Languages, and Programming (ICALP 2014, track-A

arXiv.org e-Print Archive

CiteSeerX

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Author: Chan Siu On
Diakonikolas Ilias
Servedio Rocco A
Sun Xiaorui
Publication venue
Publication date: 01/01/2014
Field of study

Let

p

be an unknown and arbitrary probability distribution over

[0,1)

. We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from

p

and must (with high probability) output a hypothesis distribution that is close to

p

. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any

k

and

\epsilon

, we give an algorithm that makes

\tilde{O}(k/\epsilon^2)

draws from

p

, runs in

\tilde{O}(k/\epsilon^2)

time, and outputs a hypothesis distribution

h

that is piecewise constant with

O(k \log^2(1/\epsilon))

pieces. With high probability the hypothesis

h

satisfies

d_{\mathrm{TV}}(p,h) \leq C \cdot \mathrm{opt}_k(p) + \epsilon

, where

d_{\mathrm{TV}}

denotes the total variation distance (statistical distance),

C

is a universal constant, and

\mathrm{opt}_k(p)

is the smallest total variation distance between

p

and any

k

-piecewise constant distribution. The sample size and running time of our algorithm are optimal up to logarithmic factors. The "approximation factor"

C

in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of

k

and

\epsilon

can achieve

C<2

regardless of what kind of hypothesis distribution it uses.Comment: conference version appears in NIPS 201

arXiv.org e-Print Archive

CiteSeerX

New efficient algorithms for multiple change-point detection with kernels

Author: Celisse Alain
Marot Guillemette
Pierre-Jean Morgane
Rigaill Guillem
Publication venue
Publication date: 01/09/2016
Field of study

Several statistical approaches based on reproducing kernels have been proposed to detect abrupt changes arising in the full distribution of the observations and not only in the mean or variance. Some of these approaches enjoy good statistical properties (oracle inequality, \ldots). Nonetheless, they have a high computational cost both in terms of time and memory. This makes their application difficult even for small and medium sample sizes (

n< 10^4

). This computational issue is addressed by first describing a new efficient and exact algorithm for kernel multiple change-point detection with an improved worst-case complexity that is quadratic in time and linear in space. It allows dealing with medium size signals (up to

n \approx 10^5

). Second, a faster but approximation algorithm is described. It is based on a low-rank approximation to the Gram matrix. It is linear in time and space. This approximation algorithm can be applied to large-scale signals (

n \geq 10^6

). These exact and approximation algorithms have been implemented in \texttt{R} and \texttt{C} for various kernels. The computational and statistical performances of these new algorithms have been assessed through empirical experiments. The runtime of the new algorithms is observed to be faster than that of other considered procedures. Finally, simulations confirmed the higher statistical accuracy of kernel-based approaches to detect changes that are not only in the mean. These simulations also illustrate the flexibility of kernel-based approaches to analyze complex biological profiles made of DNA copy number and allele B frequencies. An R package implementing the approach will be made available on github

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Small space and streaming pattern matching with k edits

Author: Kociumaka Tomasz
Porat Ely
Starikovskaya Tatiana
Publication venue
Publication date: 10/06/2021
Field of study

In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer

k

, a pattern

P

of length

m

, and a text

T

of length

n \ge m

, the task is to find substrings of

T

that are within edit distance

k

from

P

. Our main result is a streaming algorithm that solves the problem in

\tilde{O}(k^5)

space and

\tilde{O}(k^8)

amortised time per character of the text, providing answers correct with high probability. (Hereafter,

\tilde{O}(\cdot)

hides a

\mathrm{poly}(\log n)

factor.) This answers a decade-old question: since the discovery of a

\mathrm{poly}(k\log n)

-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no

\mathrm{poly}(k\log n)

-space algorithm was known even in the simpler semi-streaming model, where

T

comes as a stream but

P

is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity. In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers