40,881 research outputs found
Splaying Preorders and Postorders
Let be a binary search tree. We prove two results about the behavior of
the Splay algorithm (Sleator and Tarjan 1985). Our first result is that
inserting keys into an empty binary search tree via splaying in the order of
either 's preorder or 's postorder takes linear time. Our proof uses the
fact that preorders and postorders are pattern-avoiding: i.e. they contain no
subsequences that are order-isomorphic to and ,
respectively. Pattern-avoidance implies certain constraints on the manner in
which items are inserted. We exploit this structure with a simple potential
function that counts inserted nodes lying on access paths to uninserted nodes.
Our methods can likely be extended to permutations that avoid more general
patterns. Second, if is any other binary search tree with the same keys as
and is weight-balanced (Nievergelt and Reingold 1973), then splaying
's preorder sequence or 's postorder sequence starting from takes
linear time. To prove this, we demonstrate that preorders and postorders of
balanced search trees do not contain many large "jumps" in symmetric order, and
exploit this fact by using the dynamic finger theorem (Cole et al. 2000). Both
of our results provide further evidence in favor of the elusive "dynamic
optimality conjecture.
Optimization with pattern-avoiding input
Permutation pattern-avoidance is a central concept of both enumerative and
extremal combinatorics. In this paper we study the effect of permutation
pattern-avoidance on the complexity of optimization problems.
In the context of the dynamic optimality conjecture (Sleator, Tarjan, STOC
1983), Chalermsook, Goswami, Kozma, Mehlhorn, and Saranurak (FOCS 2015)
conjectured that the amortized access cost of an optimal binary search tree
(BST) is whenever the access sequence avoids some fixed pattern. They
showed a bound of , which was recently improved to
by Chalermsook, Pettie, and Yingchareonthawornchai
(2023); here is the BST size and the inverse-Ackermann
function. In this paper we resolve the conjecture, showing a tight
bound. This indicates a barrier to dynamic optimality: any candidate online BST
(e.g., splay trees or greedy trees) must match this optimum, but current
analysis techniques only give superconstant bounds.
More broadly, we argue that the easiness of pattern-avoiding input is a
general phenomenon, not limited to BSTs or even to data structures. To
illustrate this, we show that when the input avoids an arbitrary, fixed, a
priori unknown pattern, one can efficiently compute a -server solution of
requests from a unit interval, with total cost , in
contrast to the worst-case bound; and a traveling salesman tour
of points from a unit box, of length , in contrast to the
worst-case bound; similar results hold for the euclidean
minimum spanning tree, Steiner tree, and nearest-neighbor graphs.
We show both results to be tight. Our techniques build on the Marcus-Tardos
proof of the Stanley-Wilf conjecture, and on the recently emerging concept of
twin-width; we believe our techniques to be more generally applicable
Pattern Avoidance in k-ary Heaps
In this paper, we consider pattern avoidance in k-ary heaps, where the permutation associated with the heap is found by recording the nodes as they are encountered in a breadth-first search. We enumerate heaps that avoid patterns of length 3 and collections of patterns of length 3, first with binary heaps and then more generally with k-ary heaps
Smooth heaps and a dual view of self-adjusting data structures
We present a new connection between self-adjusting binary search trees (BSTs)
and heaps, two fundamental, extensively studied, and practically relevant
families of data structures. Roughly speaking, we map an arbitrary heap
algorithm within a natural model, to a corresponding BST algorithm with the
same cost on a dual sequence of operations (i.e. the same sequence with the
roles of time and key-space switched). This is the first general transformation
between the two families of data structures.
There is a rich theory of dynamic optimality for BSTs (i.e. the theory of
competitiveness between BST algorithms). The lack of an analogous theory for
heaps has been noted in the literature. Through our connection, we transfer all
instance-specific lower bounds known for BSTs to a general model of heaps,
initiating a theory of dynamic optimality for heaps.
On the algorithmic side, we obtain a new, simple and efficient heap
algorithm, which we call the smooth heap. We show the smooth heap to be the
heap-counterpart of Greedy, the BST algorithm with the strongest proven and
conjectured properties from the literature, widely believed to be
instance-optimal. Assuming the optimality of Greedy, the smooth heap is also
optimal within our model of heap algorithms. As corollaries of results known
for Greedy, we obtain instance-specific upper bounds for the smooth heap, with
applications in adaptive sorting.
Intriguingly, the smooth heap, although derived from a non-practical BST
algorithm, is simple and easy to implement (e.g. it stores no auxiliary data
besides the keys and tree pointers). It can be seen as a variation on the
popular pairing heap data structure, extending it with a "power-of-two-choices"
type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure
Effective retrieval and new indexing method for case based reasoning: Application in chemical process design
In this paper we try to improve the retrieval step for case based reasoning for preliminary design. This improvement deals with three major parts of our CBR system. First, in the preliminary design step, some uncertainties like imprecise or unknown values remain in the description of the problem, because they need a deeper analysis to be withdrawn. To deal with this issue, the faced problem description is soften with the fuzzy sets theory. Features are described with a central value, a percentage of imprecision and a relation with respect to the central value. These additional data allow us to build a domain of possible values for each attributes. With this representation, the calculation of the similarity function is impacted, thus the characteristic function is used to calculate the local similarity between two features. Second, we focus our attention on the main goal of the retrieve step in CBR to find relevant cases for adaptation. In this second part, we discuss the assumption of similarity to find the more appropriated case. We put in highlight that in some situations this classical similarity must be improved with further knowledge to facilitate case adaptation. To avoid failure during the adaptation step, we implement a method that couples similarity measurement with adaptability one, in order to approximate the cases utility more accurately. The latter gives deeper information for the reusing of cases. In a last part, we present a generic indexing technique for the base, and a new algorithm for the research of relevant cases in the memory. The sphere indexing algorithm is a domain independent index that has performances equivalent to the decision tree ones. But its main strength is that it puts the current problem in the center of the research area avoiding boundaries issues. All these points are discussed and exemplified through the preliminary design of a chemical engineering unit operation
Random Access to Grammar Compressed Strings
Grammar based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. In this paper, we
present a novel grammar representation that allows efficient random access to
any character or substring without decompressing the string.
Let be a string of length compressed into a context-free grammar
of size . We present two representations of
achieving random access time, and either
construction time and space on the pointer machine model, or
construction time and space on the RAM. Here, is the inverse of
the row of Ackermann's function. Our representations also efficiently
support decompression of any substring in : we can decompress any substring
of length in the same complexity as a single random access query and
additional time. Combining these results with fast algorithms for
uncompressed approximate string matching leads to several efficient algorithms
for approximate string matching on grammar-compressed strings without
decompression. For instance, we can find all approximate occurrences of a
pattern with at most errors in time , where is the number of occurrences of in . Finally, we
generalize our results to navigation and other operations on grammar-compressed
ordered trees.
All of the above bounds significantly improve the currently best known
results. To achieve these bounds, we introduce several new techniques and data
structures of independent interest, including a predecessor data structure, two
"biased" weighted ancestor data structures, and a compact representation of
heavy paths in grammars.Comment: Preliminary version in SODA 201
Perspects in astrophysical databases
Astrophysics has become a domain extremely rich of scientific data. Data
mining tools are needed for information extraction from such large datasets.
This asks for an approach to data management emphasizing the efficiency and
simplicity of data access; efficiency is obtained using multidimensional access
methods and simplicity is achieved by properly handling metadata. Moreover,
clustering and classification techniques on large datasets pose additional
requirements in terms of computation and memory scalability and
interpretability of results. In this study we review some possible solutions
Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing
This paper presents a general technique for optimally transforming any
dynamic data structure that operates on atomic and indivisible keys by
constant-time comparisons, into a data structure that handles unbounded-length
keys whose comparison cost is not a constant. Examples of these keys are
strings, multi-dimensional points, multiple-precision numbers, multi-key data
(e.g.~records), XML paths, URL addresses, etc. The technique is more general
than what has been done in previous work as no particular exploitation of the
underlying structure of is required. The only requirement is that the insertion
of a key must identify its predecessor or its successor.
Using the proposed technique, online suffix tree can be constructed in worst
case time per input symbol (as opposed to amortized
time per symbol, achieved by previously known algorithms). To our knowledge,
our algorithm is the first that achieves worst case time per input
symbol. Searching for a pattern of length in the resulting suffix tree
takes time, where is the
number of occurrences of the pattern. The paper also describes more
applications and show how to obtain alternative methods for dealing with suffix
sorting, dynamic lowest common ancestors and order maintenance
- …