Search CORE

3,052 research outputs found

The Case for Learned Index Structures

Author: Abadi M.
Armbrust M.
Böhm M.
Chang F.
Goodfellow I.
Grossi R.
Lehman T. J.
Litwin W.
Magdon-Ismail M.
Miller D. J.
Moerkotte G.
Sutskever I.
You S.
Publication venue
Publication date: 30/04/2018
Field of study

Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible

arXiv.org e-Print Archive

Crossref

Digital provenance - models, systems, and applications

Author: Sultana Salmin
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2000
Field of study

Data provenance refers to the history of creation and manipulation of a data object and is being widely used in various application domains including scientific experiments, grid computing, file and storage system, streaming data etc. However, existing provenance systems operate at a single layer of abstraction (workflow/process/OS) at which they record and store provenance whereas the provenance captured from different layers provide the highest benefit when integrated through a unified provenance framework. To build such a framework, a comprehensive provenance model able to represent the provenance of data objects with various semantics and granularity is the first step. In this thesis, we propose a such a comprehensive provenance model and present an abstract schema of the model. ^ We further explore the secure provenance solutions for distributed systems, namely streaming data, wireless sensor networks (WSNs) and virtualized environments. We design a customizable file provenance system with an application to the provenance infrastructure for virtualized environments. The system supports automatic collection and management of file provenance metadata, characterized by our provenance model. Based on the proposed provenance framework, we devise a mechanism for detecting data exfiltration attack in a file system. We then move to the direction of secure provenance communication in streaming environment and propose two secure provenance schemes focusing on WSNs. The basic provenance scheme is extended in order to detect packet dropping adversaries on the data flow path over a period of time. We also consider the issue of attack recovery and present an extensive incident response and prevention system specifically designed for WSNs

Purdue E-Pubs

HAL-Rennes 1

Self-Learning Hot Data Prediction: Where Echo State Network Meets NAND Flash Memories

Author: Ai Jiaqiu
Fang Xiaoxin
Luo Qiwu
Sun Yichuang
Yang Chunhua
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/01/2020
Field of study

© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Well understanding the access behavior of hot data is significant for NAND flash memory due to its crucial impact on the efficiency of garbage collection (GC) and wear leveling (WL), which respectively dominate the performance and life span of SSD. Generally, both GC and WL rely greatly on the recognition accuracy of hot data identification (HDI). However, in this paper, the first time we propose a novel concept of hot data prediction (HDP), where the conventional HDI becomes unnecessary. First, we develop a hybrid optimized echo state network (HOESN), where sufficiently unbiased and continuously shrunk output weights are learnt by a sparse regression based on L2 and L1/2 regularization. Second, quantum-behaved particle swarm optimization (QPSO) is employed to compute reservoir parameters (i.e., global scaling factor, reservoir size, scaling coefficient and sparsity degree) for further improving prediction accuracy and reliability. Third, in the test on a chaotic benchmark (Rossler), the HOESN performs better than those of six recent state-of-the-art methods. Finally, simulation results about six typical metrics tested on five real disk workloads and on-chip experiment outcomes verified from an actual SSD prototype indicate that our HOESN-based HDP can reliably promote the access performance and endurance of NAND flash memories.Peer reviewe

University of Hertfordshire Research Archive

Digital provenance - models, systems, and applications

Author: Sultana Salmin
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2014
Field of study

Purdue E-Pubs

Scalable and Reliable Middlebox Deployment

Author: Ghaznavi Milad
Publication venue: 'University of Waterloo'
Publication date: 25/05/2020
Field of study

Middleboxes are pervasive in modern computer networks providing functionalities beyond mere packet forwarding. Load balancers, intrusion detection systems, and network address translators are typical examples of middleboxes. Despite their benefits, middleboxes come with several challenges with respect to their scalability and reliability. The goal of this thesis is to devise middlebox deployment solutions that are cost effective, scalable, and fault tolerant. The thesis includes three main contributions: First, distributed service function chaining with multiple instances of a middlebox deployed on different physical servers to optimize resource usage; Second, Constellation, a geo-distributed middlebox framework enabling a middlebox application to operate with high performance across wide area networks; Third, a fault tolerant service function chaining system

University of Waterloo's Institutional Repository