1,697 research outputs found
A Faster Implementation of Online Run-Length Burrows-Wheeler Transform
Run-length encoding Burrows-Wheeler Transformed strings, resulting in
Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive
strings. We propose a new algorithm for online RLBWT working in run-compressed
space, which runs in time and bits of space, where
is the length of input string received so far and is the number of runs
in the BWT of the reversed . We improve the state-of-the-art algorithm for
online RLBWT in terms of empirical construction time. Adopting the dynamic list
for maintaining a total order, we can replace rank queries in a dynamic wavelet
tree on a run-length compressed string by the direct comparison of labels in a
dynamic list. The empirical result for various benchmarks show the efficiency
of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201
Parallel Graph Connectivity in Log Diameter Rounds
We study graph connectivity problem in MPC model. On an undirected graph with
nodes and edges, round connectivity algorithms have been
known for over 35 years. However, no algorithms with better complexity bounds
were known. In this work, we give fully scalable, faster algorithms for the
connectivity problem, by parameterizing the time complexity as a function of
the diameter of the graph. Our main result is a
time connectivity algorithm for diameter- graphs, using total
memory. If our algorithm can use more memory, it can terminate in fewer rounds,
and there is no lower bound on the memory per processor.
We extend our results to related graph problems such as spanning forest,
finding a DFS sequence, exact/approximate minimum spanning forest, and
bottleneck spanning forest. We also show that achieving similar bounds for
reachability in directed graphs would imply faster boolean matrix
multiplication algorithms.
We introduce several new algorithmic ideas. We describe a general technique
called double exponential speed problem size reduction which roughly means that
if we can use total memory to reduce a problem from size to , for
in one phase, then we can solve the problem in
phases. In order to achieve this fast reduction for graph
connectivity, we use a multistep algorithm. One key step is a carefully
constructed truncated broadcasting scheme where each node broadcasts neighbor
sets to its neighbors in a way that limits the size of the resulting neighbor
sets. Another key step is random leader contraction, where we choose a smaller
set of leaders than many previous works do
Monotone Drawings of -Inner Planar Graphs
A -inner planar graph is a planar graph that has a plane drawing with at
most {internal vertices}, i.e., vertices that do not lie on the boundary of
the outer face of its drawing. An outerplanar graph is a -inner planar
graph. In this paper, we show how to construct a monotone drawing of a
-inner planar graph on a grid. In the special case
of an outerplanar graph, we can produce a planar monotone drawing on a grid, improving previously known results.Comment: Appears in the Proceedings of the 26th International Symposium on
Graph Drawing and Network Visualization (GD 2018). Revised introductio
Improved Periodicity Mining in Time Series Databases
Time series data represents information about real world phenomena and periodicity mining explores the interesting periodic behavior that is inherent in the data. Periodicity mining has numerous applications such as in weather forecasting, stock market prediction and analysis, pattern recognition, etc. Recently, the suffix tree, a powerful data structure that efficiently solves many strings related problems has been used to gather information about repeated substrings in the text and then perform periodicity mining. However, periodicity mining deals with large amounts of data which makes it difficult to perform mining in main memory due to the space constraints of the suffix tree. Thus, we first propose the use of the Compressed Suffix Tree (CST) for space efficient periodicity mining in very large datasets. Given the time-space trade-off that comes with any practical usage of the CST, we provide a comprehensive empirical analysis on the practical usage of CSTs and traditional suffix trees for periodicity mining.;Noise is an inherent part of practical time series data, and it is important to mine periods in spite of the noise. This leads to the problem of approximate periodicity mining. Existing algorithms have dealt with the noise introduced between the occurrences of the periodic pattern, but not the noise introduced in the structure of the pattern itself. We present a taxonomy for approximate periodicity and then propose an algorithm that performs periodicity mining in the presence of noise introduced simultaneously in both the structure of the pattern and between the periodic occurrences of the pattern
Maintaining the Union of Unit Discs Under Insertions with Near-Optimal Overhead
We present efficient data structures for problems on unit discs and arcs of their boundary in the plane. (i) We give an output-sensitive algorithm for the dynamic maintenance of the union of n unit discs under insertions in O(k log^2 n) update time and O(n) space, where k is the combinatorial complexity of the structural change in the union due to the insertion of the new disc. (ii) As part of the solution of (i) we devise a fully dynamic data structure for the maintenance of lower envelopes of pseudo-lines, which we believe is of independent interest. The structure has O(log^2 n) update time and O(log n) vertical ray shooting query time. To achieve this performance, we devise a new algorithm for finding the intersection between two lower envelopes of pseudo-lines in O(log n) time, using tentative binary search; the lower envelopes are special in that at x=-infty any pseudo-line contributing to the first envelope lies below every pseudo-line contributing to the second envelope. (iii) We also present a dynamic range searching structure for a set of circular arcs of unit radius (not necessarily on the boundary of the union of the corresponding discs), where the ranges are unit discs, with O(n log n) preprocessing time, O(n^{1/2+epsilon} + l) query time and O(log^2 n) amortized update time, where l is the size of the output and for any epsilon>0. The structure requires O(n) storage space
- …