Search CORE

516,264 research outputs found

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

SoK: Cryptographically Protected Database Search

Author: Cunningham Robert K.
Fuller Benjamin
Gadepally Vijay
Hamlin Ariel
Mitchell John Darby
Shay Richard
Shen Emily
Varia Mayank
Yerukhimovich Arkady
Publication venue
Publication date: 01/01/2017
Field of study

Protected database search systems cryptographically isolate the roles of reading from, writing to, and administering the database. This separation limits unnecessary administrator access and protects data in the case of system breaches. Since protected search was introduced in 2000, the area has grown rapidly; systems are offered by academia, start-ups, and established companies. However, there is no best protected search system or set of techniques. Design of such systems is a balancing act between security, functionality, performance, and usability. This challenge is made more difficult by ongoing database specialization, as some users will want the functionality of SQL, NoSQL, or NewSQL databases. This database evolution will continue, and the protected search community should be able to quickly provide functionality consistent with newly invented databases. At the same time, the community must accurately and clearly characterize the tradeoffs between different approaches. To address these challenges, we provide the following contributions: 1) An identification of the important primitive operations across database paradigms. We find there are a small number of base operations that can be used and combined to support a large number of database paradigms. 2) An evaluation of the current state of protected search systems in implementing these base operations. This evaluation describes the main approaches and tradeoffs for each base operation. Furthermore, it puts protected search in the context of unprotected search, identifying key gaps in functionality. 3) An analysis of attacks against protected search for different base queries. 4) A roadmap and tools for transforming a protected search system into a protected database, including an open-source performance evaluation platform and initial user opinions of protected search.Comment: 20 pages, to appear to IEEE Security and Privac

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Prospects and limitations of full-text index structures in genome analysis

Author: Dawyndt Peter
De Baets Bernard
Fack Veerle
Vyverman Michaël
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared

Ghent University Academic Bibliography

PubMed Central

Range Queries on Uncertain Data

Author: B Chazelle
B Chazelle
B Chazelle
B Chazelle
G Frederickson
J Driscoll
J Mitchell
M Yiu
P Agarwal
Publication venue
Publication date: 09/01/2015
Field of study

Given a set

P

n

uncertain points on the real line, each represented by its one-dimensional probability density function, we consider the problem of building data structures on

P

to answer range queries of the following three types for any query interval

I

: (1) top-

1

query: find the point in

P

that lies in

I

with the highest probability, (2) top-

k

query: given any integer

k\leq n

as part of the query, return the

k

points in

P

that lie in

I

with the highest probabilities, and (3) threshold query: given any threshold

\tau

as part of the query, return all points of

P

that lie in

I

with probabilities at least

\tau

. We present data structures for these range queries with linear or nearly linear space and efficient query time.Comment: 26 pages. A preliminary version of this paper appeared in ISAAC 2014. In this full version, we also present solutions to the most general case of the problem (i.e., the histogram bounded case), which were left as open problems in the preliminary versio

arXiv.org e-Print Archive

Crossref

Non-linear Pattern Matching with Backtracking for Non-free Data Types

Author: B Braßel
DA Turner
Don Syme
F McBride
M Erwig
M Hanus
M Tullsen
Martin Erwig
R Hinze
S Antoy
S Antoy
S Thompson
Sebastian Fischer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/05/2019
Field of study

Non-free data types are data types whose data have no canonical forms. For example, multisets are non-free data types because the multiset

\{a,b,b\}

has two other equivalent but literally different forms

\{b,a,b\}

and

\{b,b,a\}

. Pattern matching is known to provide a handy tool set to treat such data types. Although many studies on pattern matching and implementations for practical programming languages have been proposed so far, we observe that none of these studies satisfy all the criteria of practical pattern matching, which are as follows: i) efficiency of the backtracking algorithm for non-linear patterns, ii) extensibility of matching process, and iii) polymorphism in patterns. This paper aims to design a new pattern-matching-oriented programming language that satisfies all the above three criteria. The proposed language features clean Scheme-like syntax and efficient and extensible pattern matching semantics. This programming language is especially useful for the processing of complex non-free data types that not only include multisets and sets but also graphs and symbolic mathematical expressions. We discuss the importance of our criteria of practical pattern matching and how our language design naturally arises from the criteria. The proposed language has been already implemented and open-sourced as the Egison programming language

arXiv.org e-Print Archive

Crossref

Particle-based and Meshless Methods with Aboria

Author: Bruna Maria
Robinson Martin
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Aboria is a powerful and flexible C++ library for the implementation of particle-based numerical methods. The particles in such methods can represent actual particles (e.g. Molecular Dynamics) or abstract particles used to discretise a continuous function over a domain (e.g. Radial Basis Functions). Aboria provides a particle container, compatible with the Standard Template Library, spatial search data structures, and a Domain Specific Language to specify non-linear operators on the particle set. This paper gives an overview of Aboria's design, an example of use, and a performance benchmark

arXiv.org e-Print Archive

Directory of Open Access Journals

Oxford University Research Archive