10,256 research outputs found

    Benchmarking Learned Indexes

    Full text link
    Recent advancements in learned index structures propose replacing existing index structures, like B-Trees, with approximate learned models. In this work, we present a unified benchmark that compares well-tuned implementations of three learned index structures against several state-of-the-art "traditional" baselines. Using four real-world datasets, we demonstrate that learned index structures can indeed outperform non-learned indexes in read-only in-memory workloads over a dense array. We also investigate the impact of caching, pipelining, dataset size, and key size. We study the performance profile of learned index structures, and build an explanation for why learned models achieve such good performance. Finally, we investigate other important properties of learned index structures, such as their performance in multi-threaded systems and their build times

    Learned Sorted Table Search and Static Indexes in Small-Space Data Models †

    Get PDF
    Machine-learning techniques, properly combined with data structures, have resulted in Learned Static Indexes, innovative and powerful tools that speed up Binary Searches with the use of additional space with respect to the table being searched into. Such space is devoted to the machine-learning models. Although in their infancy, these are methodologically and practically important, due to the pervasiveness of Sorted Table Search procedures. In modern applications, model space is a key factor, and a major open question concerning this area is to assess to what extent one can enjoy the speeding up of Binary Searches achieved by Learned Indexes while using constant or nearly constant-space models. In this paper, we investigate the mentioned question by (a) introducing two new models, i.e., the Learned k-ary Search Model and the Synoptic Recursive Model Index; and (b) systematically exploring the time–space trade-offs of a hierarchy of existing models, i.e., the ones in the reference software platform Searching on Sorted Data, together with the new ones proposed here. We document a novel and rather complex time–space trade-off picture, which is informative for users as well as designers of Learned Indexing data structures. By adhering to and extending the current benchmarking methodology, we experimentally show that the Learned k-ary Search Model is competitive in time with respect to Binary Search in constant additional space. Our second model, together with the bi-criteria Piece-wise Geometric Model Index, can achieve speeding up of Binary Search with a model space of (Formula presented.) more than the one taken by the table, thereby, being competitive in terms of the time–space trade-off with existing proposals. The Synoptic Recursive Model Index and the bi-criteria Piece-wise Geometric Model complement each other quite well across the various levels of the internal memory hierarchy. Finally, our findings stimulate research in this area since they highlight the need for further studies regarding the time–space relation in Learned Indexes

    The Case for Learned Index Structures

    Full text link
    Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

    Quality assurance of rectal cancer diagnosis and treatment - phase 3 : statistical methods to benchmark centres on a set of quality indicators

    Get PDF
    In 2004, the Belgian Section for Colorectal Surgery, a section of the Royal Belgian Society for Surgery, decided to start PROCARE (PROject on CAncer of the REctum), a multidisciplinary, profession-driven and decentralized project with as main objectives the reduction of diagnostic and therapeutic variability and improvement of outcome in patients with rectal cancer. All medical specialties involved in the care of rectal cancer established a multidisciplinary steering group in 2005. They agreed to approach the stated goal by means of treatment standardization through guidelines, implementation of these guidelines and quality assurance through registration and feedback. In 2007, the PROCARE guidelines were updated (Procare Phase I, KCE report 69). In 2008, a set of 40 process and outcome quality of care indicators (QCI) was developed and organized into 8 domains of care: general, diagnosis/staging, neoadjuvant treatment, surgery, adjuvant treatment, palliative treatment, follow-up and histopathologic examination. These QCIs were tested on the prospective PROCARE database and on an administrative (claims) database (Procare Phase II, KCE report 81). Afterwards, 4 QCIs were added by the PROCARE group. Centres have been receiving feedback from the PROCARE registry on these QCIs with a description of the distribution of the unadjusted centre-averaged observed measures and the centre’s position therein. To optimize this feedback, centres should ideally be informed of their risk-adjusted outcomes and be given some benchmarks. The PROCARE Phase III study is devoted to developing a methodology to achieve this feedback

    Data analytics for modeling and visualizing attack behaviors: A case study on SSH brute force attacks

    Get PDF
    In this research, we explore a data analytics based approach for modeling and visualizing attack behaviors. To this end, we employ Self-Organizing Map and Association Rule Mining algorithms to analyze and interpret the behaviors of SSH brute force attacks and SSH normal traffic as a case study. The experimental results based on four different data sets show that the patterns extracted and interpreted from the SSH brute force attack data sets are similar to each other but significantly different from those extracted from the SSH normal traffic data sets. The analysis of the attack traffic provides insight into behavior modeling for brute force SSH attacks. Furthermore, this sheds light into how data analytics could help in modeling and visualizing attack behaviors in general in terms of data acquisition and feature extraction

    XWeB: the XML Warehouse Benchmark

    Full text link
    With the emergence of XML as a standard for representing business data, new decision support applications are being developed. These XML data warehouses aim at supporting On-Line Analytical Processing (OLAP) operations that manipulate irregular XML data. To ensure feasibility of these new tools, important performance issues must be addressed. Performance is customarily assessed with the help of benchmarks. However, decision support benchmarks do not currently support XML features. In this paper, we introduce the XML Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from the relational decision support benchmark TPC-H. It is mainly composed of a test data warehouse that is based on a unified reference model for XML warehouses and that features XML-specific structures, and its associate XQuery decision support workload. XWeB's usage is illustrated by experiments on several XML database management systems

    Regulatory governance and sector performance : methodology and evaluation for Electricity distribution in Latin America

    Get PDF
    This paper contributes to the literature that explores the link between regulatory governance and sector performance. The paper develops an index of regulatory governance and estimates its impact on sector performance, showing that indeed regulation and its governance matter. The authors use two unique databases: (i) the World Bank Performance Database, which contains detailed annual data for 250 private and public electricity companies in Latin America and the Caribbean; and (ii) the Electricity Regulatory Governance Database, which contains data on several aspects of the governance of electricity agencies in the region. The authors run different models to explain the impacts of change in ownership and different characteristics of the regulatory agency on the performance of the utilities. The results suggest that the mere existence of a regulatory agency, regardless of the utilities'ownership, has a significant impact on performance. Furthermore, after controlling for the existence of a regulatory agency, the ownership dummies are still significant and with the expected signs. The authors propose an experience measure in order to identify the gradual impact of the regulatory agency on utility performance. The results confirm this hypothesis. In addition, the paper explores two different measures of governance, an aggregate measure of regulatory governance, and an index based on principal components, including autonomy, transparency, and accountability. The findings show that the governance of regulatory agencies matters and has significant effects on performance.National Governance,Infrastructure Regulation,Governance Indicators,Banks&Banking Reform,Emerging Markets
    • …
    corecore