Search CORE

20 research outputs found

Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test

Author: Batista Gustavo
Dos Reis Denis
Flach Peter
Matwin Stan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/08/2016
Field of study

Crossref

Explore Bristol Research

Smooth heaps and a dual view of self-adjusting data structures

Author: Kozma László
Saranurak Thatchaphol
Publication venue
Publication date: 20/06/2018
Field of study

We present a new connection between self-adjusting binary search trees (BSTs) and heaps, two fundamental, extensively studied, and practically relevant families of data structures. Roughly speaking, we map an arbitrary heap algorithm within a natural model, to a corresponding BST algorithm with the same cost on a dual sequence of operations (i.e. the same sequence with the roles of time and key-space switched). This is the first general transformation between the two families of data structures. There is a rich theory of dynamic optimality for BSTs (i.e. the theory of competitiveness between BST algorithms). The lack of an analogous theory for heaps has been noted in the literature. Through our connection, we transfer all instance-specific lower bounds known for BSTs to a general model of heaps, initiating a theory of dynamic optimality for heaps. On the algorithmic side, we obtain a new, simple and efficient heap algorithm, which we call the smooth heap. We show the smooth heap to be the heap-counterpart of Greedy, the BST algorithm with the strongest proven and conjectured properties from the literature, widely believed to be instance-optimal. Assuming the optimality of Greedy, the smooth heap is also optimal within our model of heap algorithms. As corollaries of results known for Greedy, we obtain instance-specific upper bounds for the smooth heap, with applications in adaptive sorting. Intriguingly, the smooth heap, although derived from a non-practical BST algorithm, is simple and easy to implement (e.g. it stores no auxiliary data besides the keys and tree pointers). It can be seen as a variation on the popular pairing heap data structure, extending it with a "power-of-two-choices" type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure

arXiv.org e-Print Archive

Pure OAI Repository

New Paths from Splay to Dynamic Optimality

Author: Levy Caleb C.
Tarjan Robert E.
Publication venue
Publication date: 14/07/2019
Field of study

Consider the task of performing a sequence of searches in a binary search tree. After each search, an algorithm is allowed to arbitrarily restructure the tree, at a cost proportional to the amount of restructuring performed. The cost of an execution is the sum of the time spent searching and the time spent optimizing those searches with restructuring operations. This notion was introduced by Sleator and Tarjan in (JACM, 1985), along with an algorithm and a conjecture. The algorithm, Splay, is an elegant procedure for performing adjustments while moving searched items to the top of the tree. The conjecture, called "dynamic optimality," is that the cost of splaying is always within a constant factor of the optimal algorithm for performing searches. The conjecture stands to this day. In this work, we attempt to lay the foundations for a proof of the dynamic optimality conjecture.Comment: An earlier version of this work appeared in the Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. arXiv admin note: text overlap with arXiv:1907.0630

arXiv.org e-Print Archive

The PGM-index: a multicriteria, compressed and learned approach to data indexing

Author: Ferragina Paolo
Vinciguerra Giorgio
Publication venue: 'VLDB Endowment'
Publication date: 14/10/2019
Field of study

The recent introduction of learned indexes has shaken the foundations of the decades-old field of indexing data structures. Combining, or even replacing, classic design elements such as B-tree nodes with machine learning models has proven to give outstanding improvements in the space footprint and time efficiency of data systems. However, these novel approaches are based on heuristics, thus they lack any guarantees both in their time and space requirements. We propose the Piecewise Geometric Model index (shortly, PGM-index), which achieves guaranteed I/O-optimality in query operations, learns an optimal number of linear models, and its peculiar recursive construction makes it a purely learned data structure, rather than a hybrid of traditional and learned indexes (such as RMI and FITing-tree). We show that the PGM-index improves the space of the FITing-tree by 63.3% and of the B-tree by more than four orders of magnitude, while achieving their same or even better query time efficiency. We complement this result by proposing three variants of the PGM-index. First, we design a compressed PGM-index that further reduces its space footprint by exploiting the repetitiveness at the level of the learned linear models it is composed of. Second, we design a PGM-index that adapts itself to the distribution of the queries, thus resulting in the first known distribution-aware learned index to date. Finally, given its flexibility in the offered space-time trade-offs, we propose the multicriteria PGM-index that efficiently auto-tune itself in a few seconds over hundreds of millions of keys to the possibly evolving space-time constraints imposed by the application of use. We remark to the reader that this paper is an extended and improved version of our previous paper titled "Superseding traditional indexes by orchestrating learning and geometry" (arXiv:1903.00507).Comment: We remark to the reader that this paper is an extended and improved version of our previous paper titled "Superseding traditional indexes by orchestrating learning and geometry" (arXiv:1903.00507

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

SpK: A fast atomic and microphysics code for the high-energy-density regime

Author: Chapman DA
Chittenden JP
Crilly AJ
Fraser AR
McLean KW
Niasse NPL
Rose SJ
Publication venue: 'Elsevier BV'
Publication date: 29/05/2023
Field of study

SpK is part of the numerical codebase at Imperial College London used to model high energy density physics (HEDP) experiments. SpK is an efficient atomic and microphysics code used to perform detailed configuration accounting calculations of electronic and ionic stage populations, opacities and emissivities for use in post-processing and radiation hydrodynamics simulations. This is done using screened hydrogenic atomic data supplemented by the NIST energy level database. An extended Saha model solves for chemical equilibrium with extensions for non-ideal physics, such as ionisation potential depression, and non thermal equilibrium corrections. A tree-heap (treap) data structure is used to store spectral data, such as opacity, which is dynamic thus allowing easy insertion of points around spectral lines without a-priori knowledge of the ion stage populations. Results from SpK are compared to other codes and descriptions of radiation transport solutions which use SpK data are given. The treap data structure and SpK’s computational efficiency allows inline post-processing of 3D hydrodynamics simulations with a dynamically evolving spectrum stored in a treap

Oxford University Research Archive

Spiral - Imperial College Digital Repository

Analysis and solution of different algorithmic problems

Author: Martínez García Albert
Publication venue: Universitat Politècnica de Catalunya
Publication date: 20/04/2017
Field of study

The goal of competitive programming is being able to find abstract solutions for some given algorithmic problems, and and also being able to code those ideas into an efficient and correct computer program. Performing this activity at a high level requires a bit of natural ability, (at least) hundreds of training hours, and a wide range of knowledge, obviously including many algorithms and data structures, some of them not trivial at all. This project constitutes a compilation of problems from several different relevant topics in competitive programming, with an explanation and analysis of their solution. Most of these problems were solved while training with the UPC programming teams, which have dominated their regional competition for more than one decade. The author hopes that this collection may eventually increase the interest of some readers towards competitive programming

UPCommons. Portal del coneixement obert de la UPC

Ranked Queries in Index Data Structures

Author: BIALYNICKA BIRULA IWONA
Publication venue: 'Pisa University Press'
Publication date: 15/11/2008
Field of study

A ranked query is a query which returns the top-ranking elements of a set, sorted by rank, where the rank corresponds to some sort of preference function defined on the items of the set. This thesis investigates the problem of adding rank query capabilities to several index data structures on top of their existing functionality. First, we introduce the concept of rank-sensitive data structures, based on the existing concept of output-sensitive data structures. Rank-sensitive data structures are output-sensitive data structures which are additionally given a ranking of the items stored and as a result of a query return only the k best-ranking items satisfying the given query, sorted according to rank, where k is specified at query time. We explore several ways of adding rank-sensitivity to different data structures and the different trade-offs which this incurs. The second part of the work deals with the first efficient dynamic version of the Cartesian tree – a data structure intrinsically related to rank queries

Electronic Thesis and Dissertation Archive - Università di Pisa