    Cache-Oblivious Persistence

    Partial persistence is a general transformation that takes a data structure and allows queries to be executed on any past state of the structure. The cache-oblivious model is the leading model of a modern multi-level memory hierarchy.We present the first general transformation for making cache-oblivious model data structures partially persistent

    Blame Trees

    We consider the problem of merging individual text documents, motivated by the single-file merge algorithms of document-based version control systems. Abstracting away the merging of conflicting edits to an external conflict resolution function (possibly implemented by a human), we consider the efficient identification of conflicting regions. We show how to implement tree-based document representation to quickly answer a data structure inspired by the “blame” query of some version control systems. A “blame” query associates every line of a document with the revision in which it was last edited. Our tree uses this idea to quickly identify conflicting edits. We show how to perform a merge operation in time proportional to the sum of the logarithms of the shared regions of the documents, plus the cost of conflict resolution. Our data structure is functional and therefore confluently persistent, allowing arbitrary version DAGs as in real version-control systems. Our results rely on concurrent traversal of two trees with short circuiting when shared subtrees are encountered.United States. Defense Advanced Research Projects Agency (Clean-Slate Design of Resilient, Adaptive, Secure Hosts (CRASH) program, BAA10-70)United States. Defense Advanced Research Projects Agency (contract #N66001-10-2-4088 (Bridging the Security Gap with Decentralized Information Flow Control))Danish National Research Foundation (Center for Massive Data Algorithmics (MADALGO)

    Improved Algorithms for White-Box Adversarial Streams

    We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.Comment: ICML 202

    A Practical Implementation of Parallel Ordered Maps and Sets with just Join

    Persistent Data Structures for Incremental Join Indices

    Join indices are used in relational databases to make join operations faster. Join indices essentially materialise the results of join operations and so accrue maintenance cost, which makes them more suitable for use cases where modifications are rare and joins are performed frequently. To make the maintenance cost lower incrementally updating existing indices is to be preferred. The usage of persistent data structures for the join indices were explored. Motivation for this research was the ability of persistent data structures to construct multiple partially different versions of the same data structure memory efficiently. This is useful, because there can exist different versions of join indices simultaneously due to usage of multi-version concurrency control (MVCC) in a database. The techniques used in Relaxed Radix Balanced Trees (RRB-Trees) persistent data structure were found promising, but none of the popular implementations were found directly suitable for the use case. This exploration was done from the context of a particular proprietary embedded in-memory columnar multidimensional database called FastormDB developed by RELEX Solutions. This focused the research into Java Virtual Machine (JVM) based data structures as the implementation of FastormDB is in Java. Multiple persistent data-structures made for the thesis and ones from Scala, Clojure and Paguro were evaluated with Java Microbenchmark Harness (JMH) and Java Object Layout (JOL) based benchmarks and their results analysed via visualisations

    The White-Box Adversarial Data Stream Model

    We study streaming algorithms in the white-box adversarial model, where the stream is chosen adaptively by an adversary who observes the entire internal state of the algorithm at each time step. We show that nontrivial algorithms are still possible. We first give a randomized algorithm for the L1L_1-heavy hitters problem that outperforms the optimal deterministic Misra-Gries algorithm on long streams. If the white-box adversary is computationally bounded, we use cryptographic techniques to reduce the memory of our L1L_1-heavy hitters algorithm even further and to design a number of additional algorithms for graph, string, and linear algebra problems. The existence of such algorithms is surprising, as the streaming algorithm does not even have a secret key in this model, i.e., its state is entirely known to the adversary. One algorithm we design is for estimating the number of distinct elements in a stream with insertions and deletions achieving a multiplicative approximation and sublinear space; such an algorithm is impossible for deterministic algorithms. We also give a general technique that translates any two-player deterministic communication lower bound to a lower bound for {\it randomized} algorithms robust to a white-box adversary. In particular, our results show that for all p0p\ge 0, there exists a constant Cp>1C_p>1 such that any CpC_p-approximation algorithm for FpF_p moment estimation in insertion-only streams with a white-box adversary requires Ω(n)\Omega(n) space for a universe of size nn. Similarly, there is a constant C>1C>1 such that any CC-approximation algorithm in an insertion-only stream for matrix rank requires Ω(n)\Omega(n) space with a white-box adversary. Our algorithmic results based on cryptography thus show a separation between computationally bounded and unbounded adversaries. (Abstract shortened to meet arXiv limits.)Comment: PODS 202

    Novel approaches for constructing persistent Delaunay triangulations by applying different equations and different methods

    “Delaunay triangulation and data structures are an essential field of study and research in computer science, for this reason, the correct choices, and an adequate design are essential for the development of algorithms for the efficient storage and/or retrieval of information. However, most structures are usually ephemeral, which means keeping all versions, in different copies, of the same data structure is expensive. The problem arises of developing data structures that are capable of maintaining different versions of themselves, minimizing the cost of memory, and keeping the performance of operations as close as possible to the original structure. Therefore, this research aims to aims to examine the feasibility concepts of Spatio-temporal structures such as persistence, to design a Delaunay triangulation algorithm so that it is possible to make queries and modifications at a certain time t, minimizing spatial and temporal complexity. Four new persistent data structures for Delaunay triangulation (Bowyer-Watson, Walk, Hybrid, and Graph) were proposed and developed. The results of using random images and vertex databases with different data (DAG and CGAL), proved that the data structure in its partial version is better than the other data structures that do not have persistence. Also, the full version data structures show an advance in the state of the technique. All the results will allow the algorithms to minimize the cost of memory”--Abstract, page iii

    Tracing the Compositional Process. Sound art that rewrites its own past: formation, praxis and a computer framework

    The domain of this thesis is electroacoustic computer-based music and sound art. It investigates a facet of composition which is often neglected or ill-defined: the process of composing itself and its embedding in time. Previous research mostly focused on instrumental composition or, when electronic music was included, the computer was treated as a tool which would eventually be subtracted from the equation. The aim was either to explain a resultant piece of music by reconstructing the intention of the composer, or to explain human creativity by building a model of the mind. Our aim instead is to understand composition as an irreducible unfolding of material traces which takes place in its own temporality. This understanding is formalised as a software framework that traces creation time as a version graph of transactions. The instantiation and manipulation of any musical structure implemented within this framework is thereby automatically stored in a database. Not only can it be queried ex post by an external researcher—providing a new quality for the empirical analysis of the activity of composing—but it is an integral part of the composition environment. Therefore it can recursively become a source for the ongoing composition and introduce new ways of aesthetic expression. The framework aims to unify creation and performance time, fixed and generative composition, human and algorithmic “writing”, a writing that includes indeterminate elements which condense as concurrent vertices in the version graph. The second major contribution is a critical epistemological discourse on the question of ob- servability and the function of observation. Our goal is to explore a new direction of artistic research which is characterised by a mixed methodology of theoretical writing, technological development and artistic practice. The form of the thesis is an exercise in becoming process-like itself, wherein the epistemic thing is generated by translating the gaps between these three levels. This is my idea of the new aesthetics: That through the operation of a re-entry one may establish a sort of process “form”, yielding works which go beyond a categorical either “sound-in-itself” or “conceptualism”. Exemplary processes are revealed by deconstructing a series of existing pieces, as well as through the successful application of the new framework in the creation of new pieces