Search CORE

10,256 research outputs found

Benchmarking Learned Indexes

Author: Kemper Alfons
Kipf Andreas
Kraska Tim
Marcus Ryan
Misra Sanchit
Neumann Thomas
Stoian Mihail
van Renen Alexander
Publication venue
Publication date: 29/06/2020
Field of study

Recent advancements in learned index structures propose replacing existing index structures, like B-Trees, with approximate learned models. In this work, we present a unified benchmark that compares well-tuned implementations of three learned index structures against several state-of-the-art "traditional" baselines. Using four real-world datasets, we demonstrate that learned index structures can indeed outperform non-learned indexes in read-only in-memory workloads over a dense array. We also investigate the impact of caching, pipelining, dataset size, and key size. We study the performance profile of learned index structures, and build an explanation for why learned models achieve such good performance. Finally, we investigate other important properties of learned index structures, such as their performance in multi-threaded systems and their build times

arXiv.org e-Print Archive

DSpace@MIT

Learned Sorted Table Search and Static Indexes in Small-Space Data Models †

Author: Amato D.
Giancarlo R.
Lo Bosco G.
Publication venue: MDPI
Publication date: 01/01/2023
Field of study

Machine-learning techniques, properly combined with data structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed up Binary Searches with the use of additional space with respect to the table being searched into. Such space is devoted to the machine-learning models. Although in their infancy, these are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor, and a major open question concerning this area is to assess to what extent one can enjoy the speeding up of Binary Searches achieved by Learned Indexes while using constant or nearly constant-space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index; and (b) systematically exploring the time–space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. We document a novel and rather complex time–space trade-off picture, which is informative for users as well as designers of Learned Indexing data structures. By adhering to and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model is competitive in time with respect to Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model Index, can achieve speeding up of Binary Search with a model space of (Formula presented.) more than the one taken by the table, thereby, being competitive in terms of the time–space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area since they highlight the need for further studies regarding the time–space relation in Learned Indexes

Archivio istituzionale della ricerca - Università di Palermo

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Quality assurance of rectal cancer diagnosis and treatment - phase 3 : statistical methods to benchmark centres on a set of quality indicators

Author: Baert Katrien
Boterberg Tom
Ceelen Wim
De Ridder Mark
Demetter Pieter
Goetghebeur Els
Harrington David
Peeters Marc
Storme Guy
Van Rossem Ronan
Vanhoutte Kürt
Vansteelandt Stijn
Verhulst Johanna
Vlayen Joan
Vrijens France
Publication venue: Belgian Health Care Knowledge Centre
Publication date: 01/01/2011
Field of study

In 2004, the Belgian Section for Colorectal Surgery, a section of the Royal Belgian Society for Surgery, decided to start PROCARE (PROject on CAncer of the REctum), a multidisciplinary, profession-driven and decentralized project with as main objectives the reduction of diagnostic and therapeutic variability and improvement of outcome in patients with rectal cancer. All medical specialties involved in the care of rectal cancer established a multidisciplinary steering group in 2005. They agreed to approach the stated goal by means of treatment standardization through guidelines, implementation of these guidelines and quality assurance through registration and feedback. In 2007, the PROCARE guidelines were updated (Procare Phase I, KCE report 69). In 2008, a set of 40 process and outcome quality of care indicators (QCI) was developed and organized into 8 domains of care: general, diagnosis/staging, neoadjuvant treatment, surgery, adjuvant treatment, palliative treatment, follow-up and histopathologic examination. These QCIs were tested on the prospective PROCARE database and on an administrative (claims) database (Procare Phase II, KCE report 81). Afterwards, 4 QCIs were added by the PROCARE group. Centres have been receiving feedback from the PROCARE registry on these QCIs with a description of the distribution of the unadjusted centre-averaged observed measures and the centre’s position therein. To optimize this feedback, centres should ideally be informed of their risk-adjusted outcomes and be given some benchmarks. The PROCARE Phase III study is devoted to developing a methodology to achieve this feedback

Ghent University Academic Bibliography

Kwaliteit van rectale kankerzorg : fase 3 : statistische methoden om centra te benchmarken met een set van kwaliteitsindicatoren

Author: Baert Katrien
Boterberg Tom
Ceelen Wim
Demetter Pieter
Goetghebeur Els
Harrington Mark
Peeters David
Storme Guy
Van Rossem Ronan
Vanhoutte Kürt
Vansteendlandt Stijn
Verhulst Johanna
Vlayen Joan
Vrijens France
Publication venue: Federaal Kenniscentrum voor de Gezondheidszorg (KCE)
Publication date: 01/01/2011
Field of study

Ghent University Academic Bibliography

Data analytics for modeling and visualizing attack behaviors: A case study on SSH brute force attacks

Author: Luo Xiao
Yao Chengchao
Zincir-Heywood A. Nur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

In this research, we explore a data analytics based approach for modeling and visualizing attack behaviors. To this end, we employ Self-Organizing Map and Association Rule Mining algorithms to analyze and interpret the behaviors of SSH brute force attacks and SSH normal traffic as a case study. The experimental results based on four different data sets show that the patterns extracted and interpreted from the SSH brute force attack data sets are similar to each other but significantly different from those extracted from the SSH normal traffic data sets. The analysis of the attack traffic provides insight into behavior modeling for brute force SSH attacks. Furthermore, this sheds light into how data analytics could help in modeling and visualizing attack behaviors in general in terms of data acquisition and feature extraction

Crossref

IUPUIScholarWorks

XWeB: the XML Warehouse Benchmark

Author: A. Schmidt
A. Simitsis
C. Kit
J. Darmont
J. Gray
K. Runapongsa
L. Afanasiev
L. Wyatt
P. O’Neil
R. Kimball
R. Torlone
S. Bressan
S. Rizzi
T. Böhme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/09/2010
Field of study

With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems

arXiv.org e-Print Archive

Crossref

HAL Descartes

HAL

Regulatory governance and sector performance : methodology and evaluation for Electricity distribution in Latin America

Author: Andres Luis
Azumendi Sebastian Lopez
Guasch Jose Luis
Publication venue
Publication date
Field of study

This paper contributes to the literature that explores the link between regulatory governance and sector performance. The paper develops an index of regulatory governance and estimates its impact on sector performance, showing that indeed regulation and its governance matter. The authors use two unique databases: (i) the World Bank Performance Database, which contains detailed annual data for 250 private and public electricity companies in Latin America and the Caribbean; and (ii) the Electricity Regulatory Governance Database, which contains data on several aspects of the governance of electricity agencies in the region. The authors run different models to explain the impacts of change in ownership and different characteristics of the regulatory agency on the performance of the utilities. The results suggest that the mere existence of a regulatory agency, regardless of the utilities'ownership, has a significant impact on performance. Furthermore, after controlling for the existence of a regulatory agency, the ownership dummies are still significant and with the expected signs. The authors propose an experience measure in order to identify the gradual impact of the regulatory agency on utility performance. The results confirm this hypothesis. In addition, the paper explores two different measures of governance, an aggregate measure of regulatory governance, and an index based on principal components, including autonomy, transparency, and accountability. The findings show that the governance of regulatory agencies matters and has significant effects on performance.National Governance,Infrastructure Regulation,Governance Indicators,Banks&Banking Reform,Emerging Markets

Research Papers in Economics