Search CORE

1,679 research outputs found

Hardness of Exact Distance Queries in Sparse Graphs Through Hub Labeling

Author: Alstrup S.
Alstrup S.
Köhler E.
P.
Ruzsa I. Z.
Twigg A. D.
Publication venue
Publication date: 01/01/2019
Field of study

A distance labeling scheme is an assignment of bit-labels to the vertices of an undirected, unweighted graph such that the distance between any pair of vertices can be decoded solely from their labels. An important class of distance labeling schemes is that of hub labelings, where a node

v \in G

stores its distance to the so-called hubs

S_v \subseteq V

, chosen so that for any

u,v \in V

there is

w \in S_u \cap S_v

belonging to some shortest

uv

path. Notice that for most existing graph classes, the best distance labelling constructions existing use at some point a hub labeling scheme at least as a key building block. Our interest lies in hub labelings of sparse graphs, i.e., those with

|E(G)| = O(n)

, for which we show a lowerbound of

\frac{n}{2^{O(\sqrt{\log n})}}

for the average size of the hubsets. Additionally, we show a hub-labeling construction for sparse graphs of average size

O(\frac{n}{RS(n)^{c}})

for some

0 < c < 1

, where

RS(n)

is the so-called Ruzsa-Szemer{\'e}di function, linked to structure of induced matchings in dense graphs. This implies that further improving the lower bound on hub labeling size to

\frac{n}{2^{(\log n)^{o(1)}}}

would require a breakthrough in the study of lower bounds on

RS(n)

, which have resisted substantial improvement in the last 70 years. For general distance labeling of sparse graphs, we show a lowerbound of

\frac{1}{2^{O(\sqrt{\log n})}} SumIndex(n)

, where

SumIndex(n)

is the communication complexity of the Sum-Index problem over

Z_n

. Our results suggest that the best achievable hub-label size and distance-label size in sparse graphs may be

\Theta(\frac{n}{2^{(\log n)^c}})

for some

0<c < 1

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

HAL: Hyper Article en Ligne

Hal-Diderot

The Tree Inclusion Problem: In Linear Space and Faster

Author: Alstrup S.
Alstrup S.
Alstrup S.
Alstrup S.
Bender M. A.
Cole R.
Demaine E. D.
Ferragina P.
Inge Li Gortz
Muthukrishnan S.
Philip Bille
Schlieder T.
Termier A.
Yang L. H.
Zezula P.
Publication venue
Publication date: 01/01/2011
Field of study

Given two rooted, ordered, and labeled trees

P

and

T

the tree inclusion problem is to determine if

P

can be obtained from

T

by deleting nodes in

T

. This problem has recently been recognized as an important query primitive in XML databases. Kilpel\"ainen and Mannila [\emph{SIAM J. Comput. 1995}] presented the first polynomial time algorithm using quadratic time and space. Since then several improved results have been obtained for special cases when

P

and

T

have a small number of leaves or small depth. However, in the worst case these algorithms still use quadratic time and space. Let

n_S

l_S

, and

d_S

denote the number of nodes, the number of leaves, and the %maximum depth of a tree

S \in \{P, T\}

. In this paper we show that the tree inclusion problem can be solved in space

O(n_T)

and time: O(\min(l_Pn_T, l_Pl_T\log \log n_T + n_T, \frac{n_Pn_T}{\log n_T} + n_{T}\log n_{T})). This improves or matches the best known time complexities while using only linear space instead of quadratic. This is particularly important in practical applications, such as XML databases, where the space is likely to be a bottleneck.Comment: Minor updates from last tim

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Smart City Analytics: Ensemble-Learned Prediction of Citizen Home Care

Author: Alstrup Stephen
Hansen Casper
Hansen Christian
Lioma Christina
Publication venue
Publication date: 01/01/2017
Field of study

We present an ensemble learning method that predicts large increases in the hours of home care received by citizens. The method is supervised, and uses different ensembles of either linear (logistic regression) or non-linear (random forests) classifiers. Experiments with data available from 2013 to 2017 for every citizen in Copenhagen receiving home care (27,775 citizens) show that prediction can achieve state of the art performance as reported in similar health related domains (AUC=0.715). We further find that competitive results can be obtained by using limited information for training, which is very useful when full records are not accessible or available. Smart city analytics does not necessarily require full city records. To our knowledge this preliminary study is the first to predict large increases in home care for smart city analytics

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Sequence Modelling For Analysing Student Interaction with Educational Systems

Author: Alstrup Stephen
Hansen Casper
Hansen Christian
Hjuler Niklas
Lioma Christina
Publication venue
Publication date: 25/06/2017
Field of study

The analysis of log data generated by online educational systems is an important task for improving the systems, and furthering our knowledge of how students learn. This paper uses previously unseen log data from Edulab, the largest provider of digital learning for mathematics in Denmark, to analyse the sessions of its users, where 1.08 million student sessions are extracted from a subset of their data. We propose to model students as a distribution of different underlying student behaviours, where the sequence of actions from each session belongs to an underlying student behaviour. We model student behaviour as Markov chains, such that a student is modelled as a distribution of Markov chains, which are estimated using a modified k-means clustering algorithm. The resulting Markov chains are readily interpretable, and in a qualitative analysis around 125,000 student sessions are identified as exhibiting unproductive student behaviour. Based on our results this student representation is promising, especially for educational systems offering many different learning usages, and offers an alternative to common approaches like modelling student behaviour as a single Markov chain often done in the literature.Comment: The 10th International Conference on Educational Data Mining 201

arXiv.org e-Print Archive

Copenhagen University Research Information System

Sublinear Distance Labeling

Author: Alstrup Stephen
Dahlgaard Søren
Knudsen Mathias Bæk Tejs
Porat Ely
Publication venue
Publication date: 01/01/2016
Field of study

A distance labeling scheme labels the

n

nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A

D

-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least

D

from each other. In this paper we consider distance labeling schemes for the classical case of unweighted graphs with both directed and undirected edges. We present a

O(\frac{n}{D}\log^2 D)

bit

D

-preserving distance labeling scheme, improving the previous bound by Bollob\'as et. al. [SIAM J. Discrete Math. 2005]. We also give an almost matching lower bound of

\Omega(\frac{n}{D})

. With our

D

-preserving distance labeling scheme as a building block, we additionally achieve the following results: 1. We present the first distance labeling scheme of size

o(n)

for sparse graphs (and hence bounded degree graphs). This addresses an open problem by Gavoille et. al. [J. Algo. 2004], hereby separating the complexity from distance labeling in general graphs which require

\Omega(n)

bits, Moon [Proc. of Glasgow Math. Association 1965]. 2. For approximate

r

-additive labeling schemes, that return distances within an additive error of

r

we show a scheme of size

O\left ( \frac{n}{r} \cdot\frac{\operatorname{polylog} (r\log n)}{\log n} \right )

for

r \ge 2

. This improves on the current best bound of

O\left(\frac{n}{r}\right)

by Alstrup et. al. [SODA 2016] for sub-polynomial

r

, and is a generalization of a result by Gawrychowski et al. [arXiv preprint 2015] who showed this for

r=2

.Comment: A preliminary version of this paper appeared at ESA'1

arXiv.org e-Print Archive

Copenhagen University Research Information System

DROPS Dagstuhl Research Online Publication Server

Neural Speed Reading with Structural-Jump-LSTM

Author: Alstrup Stephen
Hansen Casper
Hansen Christian
Lioma Christina
Simonsen Jakob Grue
Publication venue
Publication date: 01/01/2019
Field of study

Recurrent neural networks (RNNs) can model natural language by sequentially 'reading' input tokens and outputting a distributed representation of each token. Due to the sequential nature of RNNs, inference time is linearly dependent on the input length, and all inputs are read regardless of their importance. Efforts to speed up this inference, known as 'neural speed reading', either ignore or skim over part of the input. We present Structural-Jump-LSTM: the first neural speed reading model to both skip and jump text during inference. The model consists of a standard LSTM and two agents: one capable of skipping single words when reading, and one capable of exploiting punctuation structure (sub-sentence separators (,:), sentence end symbols (.!?), or end of text markers) to jump ahead after reading a word. A comprehensive experimental evaluation of our model against all five state-of-the-art neural reading models shows that Structural-Jump-LSTM achieves the best overall floating point operations (FLOP) reduction (hence is faster), while keeping the same accuracy or even improving it compared to a vanilla LSTM that reads the whole text.Comment: 10 page

arXiv.org e-Print Archive

Copenhagen University Research Information System

Near-Optimal Induced Universal Graphs for Bounded Degree Graphs

Author: Abrahamsen Mikkel
Alstrup Stephen
Holm Jacob
Knudsen Mathias Bæk Tejs
Stöckel Morten
Publication venue
Publication date: 21/07/2016
Field of study

A graph

U

is an induced universal graph for a family

F

of graphs if every graph in

F

is a vertex-induced subgraph of

U

. For the family of all undirected graphs on

n

vertices Alstrup, Kaplan, Thorup, and Zwick [STOC 2015] give an induced universal graph with

O\!\left(2^{n/2}\right)

vertices, matching a lower bound by Moon [Proc. Glasgow Math. Assoc. 1965]. Let

k= \lceil D/2 \rceil

. Improving asymptotically on previous results by Butler [Graphs and Combinatorics 2009] and Esperet, Arnaud and Ochem [IPL 2008], we give an induced universal graph with

O\!\left(\frac{k2^k}{k!}n^k \right)

vertices for the family of graphs with

n

vertices of maximum degree

D

. For constant

D

, Butler gives a lower bound of

\Omega\!\left(n^{D/2}\right)

. For an odd constant

D\geq 3

, Esperet et al. and Alon and Capalbo [SODA 2008] give a graph with

O\!\left(n^{k-\frac{1}{D}}\right)

vertices. Using their techniques for any (including constant) even values of

D

gives asymptotically worse bounds than we present. For large

D

, i.e. when

D = \Omega\left(\log^3 n\right)

, the previous best upper bound was

{n\choose\lceil D/2\rceil} n^{O(1)}

due to Adjiashvili and Rotbart [ICALP 2014]. We give upper and lower bounds showing that the size is

{\lfloor n/2\rfloor\choose\lfloor D/2 \rfloor}2^{\pm\tilde{O}\left(\sqrt{D}\right)}

. Hence the optimal size is

2^{\tilde{O}(D)}

and our construction is within a factor of

2^{\tilde{O}\left(\sqrt{D}\right)}

from this. The previous results were larger by at least a factor of

2^{\Omega(D)}

. As a part of the above, proving a conjecture by Esperet et al., we construct an induced universal graph with

2n-1

vertices for the family of graphs with max degree

2

. In addition, we give results for acyclic graphs with max degree

2

and cycle graphs. Our results imply the first labeling schemes that for any

D

are at most

o(n)

bits from optimal

arXiv.org e-Print Archive

Copenhagen University Research Information System

DROPS Dagstuhl Research Online Publication Server

Constructing Light Spanners Deterministically in Near-Linear Time

Author: Alstrup Stephen
Filtser Arnold
Wulff-Nilsen Christian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 27th Annual European Symposium on Algorithms (ESA 2019)
Publication date: 01/01/2019
Field of study

Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [Shiri Chechik and Christian Wulff-Nilsen, 2018] improved the state-of-the-art for light spanners by constructing a (2k-1)(1+epsilon)-spanner with O(n^(1+1/k)) edges and O_epsilon(n^(1/k)) lightness. Soon after, Filtser and Solomon [Arnold Filtser and Shay Solomon, 2016] showed that the classic greedy spanner construction achieves the same bounds. The major drawback of the greedy spanner is its running time of O(mn^(1+1/k)) (which is faster than [Shiri Chechik and Christian Wulff-Nilsen, 2018]). This makes the construction impractical even for graphs of moderate size. Much faster spanner constructions do exist but they only achieve lightness Omega_epsilon(kn^(1/k)), even when randomization is used. The contribution of this paper is deterministic spanner constructions that are fast, and achieve similar bounds as the state-of-the-art slower constructions. Our first result is an O_epsilon(n^(2+1/k+epsilon\u27)) time spanner construction which achieves the state-of-the-art bounds. Our second result is an O_epsilon(m + n log n) time construction of a spanner with (2k-1)(1+epsilon) stretch, O(log k * n^(1+1/k) edges and O_epsilon(log k * n^(1/k)) lightness. This is an exponential improvement in the dependence on k compared to the previous result with such running time. Finally, for the important special case where k=log n, for every constant epsilon>0, we provide an O(m+n^(1+epsilon)) time construction that produces an O(log n)-spanner with O(n) edges and O(1) lightness which is asymptotically optimal. This is the first known sub-quadratic construction of such a spanner for any k = omega(1). To achieve our constructions, we show a novel deterministic incremental approximate distance oracle. Our new oracle is crucial in our construction, as known randomized dynamic oracles require the assumption of a non-adaptive adversary. This is a strong assumption, which has seen recent attention in prolific venues. Our new oracle allows the order of the edge insertions to not be fixed in advance, which is critical as our spanner algorithm chooses which edges to insert based on the answers to distance queries. We believe our new oracle is of independent interest

Copenhagen University Research Information System

DROPS Dagstuhl Research Online Publication Server

2-Vertex Connectivity in Directed Graphs

Author: AL Buchsbaum
GF Italiano
H Nagamochi
K Menger
RE Tarjan
RE Tarjan
S Alstrup
Publication venue
Publication date: 19/02/2015
Field of study

We complement our study of 2-connectivity in directed graphs, by considering the computation of the following 2-vertex-connectivity relations: We say that two vertices v and w are 2-vertex-connected if there are two internally vertex-disjoint paths from v to w and two internally vertex-disjoint paths from w to v. We also say that v and w are vertex-resilient if the removal of any vertex different from v and w leaves v and w in the same strongly connected component. We show how to compute the above relations in linear time so that we can report in constant time if two vertices are 2-vertex-connected or if they are vertex-resilient. We also show how to compute in linear time a sparse certificate for these relations, i.e., a subgraph of the input graph that has O(n) edges and maintains the same 2-vertex-connectivity and vertex-resilience relations as the input graph, where n is the number of vertices.Comment: arXiv admin note: substantial text overlap with arXiv:1407.304

arXiv.org e-Print Archive

Crossref

ART

Simpler, faster and shorter labels for distances in graphs

Author: Alstrup Stephen
Gavoille Cyril
Halvorsen Esben Bistrup
Petersen Holger
Publication venue
Publication date: 17/04/2015
Field of study

We consider how to assign labels to any undirected graph with n nodes such that, given the labels of two nodes and no other information regarding the graph, it is possible to determine the distance between the two nodes. The challenge in such a distance labeling scheme is primarily to minimize the maximum label lenght and secondarily to minimize the time needed to answer distance queries (decoding). Previous schemes have offered different trade-offs between label lengths and query time. This paper presents a simple algorithm with shorter labels and shorter query time than any previous solution, thereby improving the state-of-the-art with respect to both label length and query time in one single algorithm. Our solution addresses several open problems concerning label length and decoding time and is the first improvement of label length for more than three decades. More specifically, we present a distance labeling scheme with label size (log 3)/2 + o(n) (logarithms are in base 2) and O(1) decoding time. This outperforms all existing results with respect to both size and decoding time, including Winkler's (Combinatorica 1983) decade-old result, which uses labels of size (log 3)n and O(n/log n) decoding time, and Gavoille et al. (SODA'01), which uses labels of size 11n + o(n) and O(loglog n) decoding time. In addition, our algorithm is simpler than the previous ones. In the case of integral edge weights of size at most W, we present almost matching upper and lower bounds for label sizes. For r-additive approximation schemes, where distances can be off by an additive constant r, we give both upper and lower bounds. In particular, we present an upper bound for 1-additive approximation schemes which, in the unweighted case, has the same size (ignoring second order terms) as an adjacency scheme: n/2. We also give results for bipartite graphs and for exact and 1-additive distance oracles

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System