Search CORE

611 research outputs found

Deterministic Fully Dynamic SSSP and More

Author: Brand Jan van den
Karczmarz Adam
Publication venue
Publication date: 28/09/2023
Field of study

We present the first non-trivial fully dynamic algorithm maintaining exact single-source distances in unweighted graphs. This resolves an open problem stated by Sankowski [COCOON 2005] and van den Brand and Nanongkai [FOCS 2019]. Previous fully dynamic single-source distances data structures were all approximate, but so far, non-trivial dynamic algorithms for the exact setting could only be ruled out for polynomially weighted graphs (Abboud and Vassilevska Williams, [FOCS 2014]). The exact unweighted case remained the main case for which neither a subquadratic dynamic algorithm nor a quadratic lower bound was known. Our dynamic algorithm works on directed graphs, is deterministic, and can report a single-source shortest paths tree in subquadratic time as well. Thus we also obtain the first deterministic fully dynamic data structure for reachability (transitive closure) with subquadratic update and query time. This answers an open problem of van den Brand, Nanongkai, and Saranurak [FOCS 2019]. Finally, using the same framework we obtain the first fully dynamic data structure maintaining all-pairs

(1+\epsilon)

-approximate distances within non-trivial sub-

n^\omega

worst-case update time while supporting optimal-time approximate shortest path reporting at the same time. This data structure is also deterministic and therefore implies the first known non-trivial deterministic worst-case bound for recomputing the transitive closure of a digraph.Comment: Extended abstract to appear in FOCS 202

arXiv.org e-Print Archive

Fully Dynamic Shortest Path Reporting Against an Adaptive Adversary

Author: Alokhina Anastasiia
Brand Jan van den
Publication venue
Publication date: 26/11/2023
Field of study

Algebraic data structures are the main subroutine for maintaining distances in fully dynamic graphs in subquadratic time. However, these dynamic algebraic algorithms generally cannot maintain the shortest paths, especially against adaptive adversaries. We present the first fully dynamic algorithm that maintains the shortest paths against an adaptive adversary in subquadratic update time. This is obtained via a combinatorial reduction that allows reconstructing the shortest paths with only a few distance estimates. Using this reduction, we obtain the following: On weighted directed graphs with real edge weights in

[1,W]

, we can maintain

(1+\epsilon)

approximate shortest paths in

\tilde{O}(n^{1.816}\epsilon^{-2} \log W)

update and

\tilde{O}(n^{1.741} \epsilon^{-2} \log W)

query time. This improves upon the approximate distance data structures from [v.d.Brand, Nanongkai, FOCS'19], which only returned a distance estimate, by matching their complexity and returning an approximate shortest path. On unweighted directed graphs, we can maintain exact shortest paths in

\tilde{O}(n^{1.823})

update and

\tilde{O}(n^{1.747})

query time. This improves upon [Bergamaschi, Henzinger, P.Gutenberg, V.Williams, Wein, SODA'21] who could report the path only against oblivious adversaries. We improve both their update and query time while also handling adaptive adversaries. On unweighted undirected graphs, our reduction holds not just against adaptive adversaries but is also deterministic. We maintain a

(1+\epsilon)

-approximate

st

-shortest path in

O(n^{1.529} / \epsilon^2)

time per update, and

(1+\epsilon)

-approximate single source shortest paths in

O(n^{1.764} / \epsilon^2)

time per update. Previous deterministic results by [v.d.Brand, Nazari, Forster, FOCS'22] could only maintain distance estimates but no paths

arXiv.org e-Print Archive

Fast Deterministic Fully Dynamic Distance Approximation

Author: Brand Jan van den
Forster Sebastian
Nazari Yasamin
Publication venue
Publication date: 05/05/2022
Field of study

In this paper, we develop deterministic fully dynamic algorithms for computing approximate distances in a graph with worst-case update time guarantees. In particular, we obtain improved dynamic algorithms that, given an unweighted and undirected graph

G=(V,E)

undergoing edge insertions and deletions, and a parameter

0 < \epsilon \leq 1

, maintain

(1+\epsilon)

-approximations of the

st

-distance between a given pair of nodes

s

and

t

, the distances from a single source to all nodes ("SSSP"), the distances from multiple sources to all nodes ("MSSP"), or the distances between all nodes ("APSP"). Our main result is a deterministic algorithm for maintaining

(1+\epsilon)

-approximate

st

-distance with worst-case update time

O(n^{1.407})

(for the current best known bound on the matrix multiplication exponent

\omega

). This even improves upon the fastest known randomized algorithm for this problem. Similar to several other well-studied dynamic problems whose state-of-the-art worst-case update time is

O(n^{1.407})

, this matches a conditional lower bound [BNS, FOCS 2019]. We further give a deterministic algorithm for maintaining

(1+\epsilon)

-approximate single-source distances with worst-case update time

O(n^{1.529})

, which also matches a conditional lower bound. At the core, our approach is to combine algebraic distance maintenance data structures with near-additive emulator constructions. This also leads to novel dynamic algorithms for maintaining

(1+\epsilon, \beta)

-emulators that improve upon the state of the art, which might be of independent interest. Our techniques also lead to improved randomized algorithms for several problems such as exact

st

-distances and diameter approximation.Comment: Changes to the previous version: improved bounds for approximate st distances using new algebraic data structure

arXiv.org e-Print Archive

Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models

Author: Brand Jan van den
Song Zhao
Zhou Tianyi
Publication venue
Publication date: 04/04/2023
Field of study

Large language models (LLMs) have made fundamental changes in human life. The attention scheme is one of the key components over all the LLMs, such as BERT, GPT-1, Transformers, GPT-2, 3, 3.5 and 4. Inspired by previous theoretical study of static version of the attention multiplication problem [Zandieh, Han, Daliri, and Karbasi arXiv 2023, Alman and Song arXiv 2023]. In this work, we formally define a dynamic version of attention matrix multiplication problem. There are matrices

Q,K, V \in \mathbb{R}^{n \times d}

, they represent query, key and value in LLMs. In each iteration we update one entry in

K

V

. In the query stage, we receive

(i,j) \in [n] \times [d]

as input, and want to answer

(D^{-1} A V)_{i,j}

, where

A:=\exp(QK^\top) \in \mathbb{R}^{n \times n}

is a square matrix and

D := \mathrm{diag}(A {\bf 1}_n) \in \mathbb{R}^{n \times n}

is a diagonal matrix. Here

{\bf 1}_n

denote a length-

n

vector that all the entries are ones. We provide two results: an algorithm and a conditional lower bound.

\bullet

On one hand, inspired by the lazy update idea from [Demetrescu and Italiano FOCS 2000, Sankowski FOCS 2004, Cohen, Lee and Song STOC 2019, Brand SODA 2020], we provide a data-structure that uses

O(n^{\omega(1,1,\tau)-\tau})

amortized update time, and

O(n^{1+\tau})

worst-case query time.

\bullet

On the other hand, show that unless the hinted matrix vector multiplication conjecture [Brand, Nanongkai and Saranurak FOCS 2019] is false, there is no algorithm that can use both

O(n^{\omega(1,1,\tau) - \tau- \Omega(1)})

amortized update time, and

O(n^{1+\tau-\Omega(1)})

worst query time. In conclusion, our algorithmic result is conditionally optimal unless hinted matrix vector multiplication conjecture is false

arXiv.org e-Print Archive

Dynamic Maxflow via Dynamic Interior Point Methods

Author: Brand Jan van den
Liu Yang P.
Sidford Aaron
Publication venue
Publication date: 12/12/2022
Field of study

In this paper we provide an algorithm for maintaining a

(1-\epsilon)

-approximate maximum flow in a dynamic, capacitated graph undergoing edge additions. Over a sequence of

m

-additions to an

n

-node graph where every edge has capacity

O(\mathrm{poly}(m))

our algorithm runs in time

\widehat{O}(m \sqrt{n} \cdot \epsilon^{-1})

. To obtain this result we design dynamic data structures for the more general problem of detecting when the value of the minimum cost circulation in a dynamic graph undergoing edge additions obtains value at most

F

(exactly) for a given threshold

F

. Over a sequence

m

-additions to an

n

-node graph where every edge has capacity

O(\mathrm{poly}(m))

and cost

O(\mathrm{poly}(m))

we solve this thresholded minimum cost flow problem in

\widehat{O}(m \sqrt{n})

. Both of our algorithms succeed with high probability against an adaptive adversary. We obtain these results by dynamizing the recent interior point method used to obtain an almost linear time algorithm for minimum cost flow (Chen, Kyng, Liu, Peng, Probst Gutenberg, Sachdeva 2022), and introducing a new dynamic data structure for maintaining minimum ratio cycles in an undirected graph that succeeds with high probability against adaptive adversaries.Comment: 30 page

arXiv.org e-Print Archive

On Dynamic Graph Algorithms with Predictions

Author: Brand Jan van den
Forster Sebastian
Nazari Yasamin
Polak Adam
Publication venue
Publication date: 08/12/2023
Field of study

We study dynamic algorithms in the model of algorithms with predictions. We assume the algorithm is given imperfect predictions regarding future updates, and we ask how such predictions can be used to improve the running time. This can be seen as a model interpolating between classic online and offline dynamic algorithms. Our results give smooth tradeoffs between these two extreme settings. First, we give algorithms for incremental and decremental transitive closure and approximate APSP that take as an additional input a predicted sequence of updates (edge insertions, or edge deletions, respectively). They preprocess it in

\tilde{O}(n^{(3+\omega)/2})

time, and then handle updates in

\tilde{O}(1)

worst-case time and queries in

\tilde{O}(\eta^2)

worst-case time. Here

\eta

is an error measure that can be bounded by the maximum difference between the predicted and actual insertion (deletion) time of an edge, i.e., by the

\ell_\infty

-error of the predictions. The second group of results concerns fully dynamic problems with vertex updates, where the algorithm has access to a predicted sequence of the next

n

updates. We show how to solve fully dynamic triangle detection, maximum matching, single-source reachability, and more, in

O(n^{\omega-1}+n\eta_i)

worst-case update time. Here

\eta_i

denotes how much earlier the

i

-th update occurs than predicted. Our last result is a reduction that transforms a worst-case incremental algorithm without predictions into a fully dynamic algorithm which is given a predicted deletion time for each element at the time of its insertion. As a consequence we can, e.g., maintain fully dynamic exact APSP with such predictions in

\tilde{O}(n^2)

worst-case vertex insertion time and

\tilde{O}(n^2 (1+\eta_i))

worst-case vertex deletion time (for the prediction error

\eta_i

defined as above).Comment: To appear in proceedings of SODA 2024. Abstract shortened to meet arXiv requirement

arXiv.org e-Print Archive

Training (Overparametrized) Neural Networks in Near-Linear Time

Author: Peng Binghui
Song Zhao
van den Brand Jan
Weinstein Omri
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)
Publication date: 08/12/2020
Field of study

The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for developing faster

\mathit{second}

\mathit{order}

optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate (

\mathit{independent}

of the training batch size

n

), second-order algorithms incur a daunting slowdown in the

\mathit{cost}

\mathit{per}

\mathit{iteration}

(inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an

O(mn^2)

-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width

m

. We show how to speed up the algorithm of [CGH+19], achieving an

\tilde{O}(mn)

-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension (

mn

) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an

\ell_2

-regression problem, and then use a Fast-JL type dimension reduction to

\mathit{precondition}

the underlying Gram matrix in time independent of

M

, allowing to find a sufficiently good approximate solution via

\mathit{first}

\mathit{order}

conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra -- which led to recent breakthroughs in

\mathit{convex}

\mathit{optimization}

(ERM, LPs, Regression) -- can be carried over to the realm of deep learning as well

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server