13 research outputs found
Cache-Oblivious Persistence
Partial persistence is a general transformation that takes a data structure
and allows queries to be executed on any past state of the structure. The
cache-oblivious model is the leading model of a modern multi-level memory
hierarchy.We present the first general transformation for making
cache-oblivious model data structures partially persistent
Blame Trees
We consider the problem of merging individual text documents, motivated by the single-file merge algorithms of document-based version control systems. Abstracting away the merging of conflicting edits to an external conflict resolution function (possibly implemented by a human), we consider the efficient identification of conflicting regions. We show how to implement tree-based document representation to quickly answer a data structure inspired by the “blame” query of some version control systems. A “blame” query associates every line of a document with the revision in which it was last edited. Our tree uses this idea to quickly identify conflicting edits. We show how to perform a merge operation in time proportional to the sum of the logarithms of the shared regions of the documents, plus the cost of conflict resolution. Our data structure is functional and therefore confluently persistent, allowing arbitrary version DAGs as in real version-control systems. Our results rely on concurrent traversal of two trees with short circuiting when shared subtrees are encountered.United States. Defense Advanced Research Projects Agency (Clean-Slate Design of Resilient, Adaptive, Secure Hosts (CRASH) program, BAA10-70)United States. Defense Advanced Research Projects Agency (contract #N66001-10-2-4088 (Bridging the Security Gap with Decentralized Information Flow Control))Danish National Research Foundation (Center for Massive Data Algorithmics (MADALGO)
Improved Algorithms for White-Box Adversarial Streams
We study streaming algorithms in the white-box adversarial stream model,
where the internal state of the streaming algorithm is revealed to an adversary
who adaptively generates the stream updates, but the algorithm obtains fresh
randomness unknown to the adversary at each time step. We incorporate
cryptographic assumptions to construct robust algorithms against such
adversaries. We propose efficient algorithms for sparse recovery of vectors,
low rank recovery of matrices and tensors, as well as low rank plus sparse
recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our
algorithms can report when the input is not sparse or low rank even in the
presence of such an adversary. We use these recovery algorithms to improve upon
and solve new problems in numerical linear algebra and combinatorial
optimization on white-box adversarial streams. For example, we give the first
efficient algorithm for outputting a matching in a graph with insertions and
deletions to its edges provided the matching size is small, and otherwise we
declare the matching size is large. We also improve the approximation versus
memory tradeoff of previous work for estimating the number of non-zero elements
in a vector and computing the matrix rank.Comment: ICML 202
Persistent Data Structures for Incremental Join Indices
Join indices are used in relational databases to make join operations faster. Join indices essentially materialise the results of join operations and so accrue maintenance cost, which makes them more suitable for use cases where modifications are rare and joins are performed frequently. To make the maintenance cost lower incrementally updating existing indices is to be preferred.
The usage of persistent data structures for the join indices were explored. Motivation for this research was the ability of persistent data structures to construct multiple partially different versions of the same data structure memory efficiently. This is useful, because there can exist different versions of join indices simultaneously due to usage of multi-version concurrency control (MVCC) in a database. The techniques used in Relaxed Radix Balanced Trees (RRB-Trees) persistent data structure were found promising, but none of the popular implementations were found directly suitable for the use case.
This exploration was done from the context of a particular proprietary embedded in-memory columnar multidimensional database called FastormDB developed by RELEX Solutions. This focused the research into Java Virtual Machine (JVM) based data structures as the implementation of FastormDB is in Java. Multiple persistent data-structures made for the thesis and ones from Scala, Clojure and Paguro were evaluated with Java Microbenchmark Harness (JMH) and Java Object Layout (JOL) based benchmarks and their results analysed via visualisations
The White-Box Adversarial Data Stream Model
We study streaming algorithms in the white-box adversarial model, where the
stream is chosen adaptively by an adversary who observes the entire internal
state of the algorithm at each time step. We show that nontrivial algorithms
are still possible. We first give a randomized algorithm for the -heavy
hitters problem that outperforms the optimal deterministic Misra-Gries
algorithm on long streams. If the white-box adversary is computationally
bounded, we use cryptographic techniques to reduce the memory of our
-heavy hitters algorithm even further and to design a number of additional
algorithms for graph, string, and linear algebra problems. The existence of
such algorithms is surprising, as the streaming algorithm does not even have a
secret key in this model, i.e., its state is entirely known to the adversary.
One algorithm we design is for estimating the number of distinct elements in a
stream with insertions and deletions achieving a multiplicative approximation
and sublinear space; such an algorithm is impossible for deterministic
algorithms.
We also give a general technique that translates any two-player deterministic
communication lower bound to a lower bound for {\it randomized} algorithms
robust to a white-box adversary. In particular, our results show that for all
, there exists a constant such that any -approximation
algorithm for moment estimation in insertion-only streams with a
white-box adversary requires space for a universe of size .
Similarly, there is a constant such that any -approximation algorithm
in an insertion-only stream for matrix rank requires space with a
white-box adversary. Our algorithmic results based on cryptography thus show a
separation between computationally bounded and unbounded adversaries.
(Abstract shortened to meet arXiv limits.)Comment: PODS 202
Novel approaches for constructing persistent Delaunay triangulations by applying different equations and different methods
“Delaunay triangulation and data structures are an essential field of study and research in computer science, for this reason, the correct choices, and an adequate design are essential for the development of algorithms for the efficient storage and/or retrieval of information. However, most structures are usually ephemeral, which means keeping all versions, in different copies, of the same data structure is expensive. The problem arises of developing data structures that are capable of maintaining different versions of themselves, minimizing the cost of memory, and keeping the performance of operations as close as possible to the original structure. Therefore, this research aims to aims to examine the feasibility concepts of Spatio-temporal structures such as persistence, to design a Delaunay triangulation algorithm so that it is possible to make queries and modifications at a certain time t, minimizing spatial and temporal complexity. Four new persistent data structures for Delaunay triangulation (Bowyer-Watson, Walk, Hybrid, and Graph) were proposed and developed. The results of using random images and vertex databases with different data (DAG and CGAL), proved that the data structure in its partial version is better than the other data structures that do not have persistence. Also, the full version data structures show an advance in the state of the technique. All the results will allow the algorithms to minimize the cost of memory”--Abstract, page iii
Tracing the Compositional Process. Sound art that rewrites its own past: formation, praxis and a computer framework
The domain of this thesis is electroacoustic computer-based music and sound art. It investigates
a facet of composition which is often neglected or ill-defined: the process of composing itself
and its embedding in time. Previous research mostly focused on instrumental composition or,
when electronic music was included, the computer was treated as a tool which would eventually
be subtracted from the equation. The aim was either to explain a resultant piece of music by
reconstructing the intention of the composer, or to explain human creativity by building a model
of the mind.
Our aim instead is to understand composition as an irreducible unfolding of material traces which
takes place in its own temporality. This understanding is formalised as a software framework
that traces creation time as a version graph of transactions. The instantiation and manipulation
of any musical structure implemented within this framework is thereby automatically stored
in a database. Not only can it be queried ex post by an external researcher—providing a new
quality for the empirical analysis of the activity of composing—but it is an integral part of
the composition environment. Therefore it can recursively become a source for the ongoing
composition and introduce new ways of aesthetic expression. The framework aims to unify
creation and performance time, fixed and generative composition, human and algorithmic
“writing”, a writing that includes indeterminate elements which condense as concurrent vertices
in the version graph.
The second major contribution is a critical epistemological discourse on the question of ob-
servability and the function of observation. Our goal is to explore a new direction of artistic
research which is characterised by a mixed methodology of theoretical writing, technological
development and artistic practice. The form of the thesis is an exercise in becoming process-like
itself, wherein the epistemic thing is generated by translating the gaps between these three levels.
This is my idea of the new aesthetics: That through the operation of a re-entry one may establish
a sort of process “form”, yielding works which go beyond a categorical either “sound-in-itself”
or “conceptualism”.
Exemplary processes are revealed by deconstructing a series of existing pieces, as well as
through the successful application of the new framework in the creation of new pieces