961 research outputs found

    ๊ฐ„๊ฒฐํ•œ ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ํ™œ์šฉํ•œ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ˜•์‹๋“ค์˜ ๊ณต๊ฐ„ ํšจ์œจ์  ํ‘œํ˜„๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. Srinivasa Rao Satti.Numerous big data are generated from a plethora of sources. Most of the data stored as files contain a non-fixed type of schema, so that the files are suitable to be maintained as semi-structured document formats. A number of those formats, such as XML (eXtensible Markup Language), JSON (JavaScript Object Notation), and YAML (YAML Ain't Markup Language) are suggested to sustain hierarchy in the original corpora of data. Several data models structuring the gathered data - including RDF (Resource Description Framework) - depend on the semi-structured document formats to be serialized and transferred for future processing. Since the semi-structured document formats focus on readability and verbosity, redundant space is required to organize and maintain the document. Even though general-purpose compression schemes are widely used to compact the documents, applying those algorithms hinder future handling of the corpora, owing to loss of internal structures. The area of succinct data structures is widely investigated and researched in theory, to provide answers to the queries while the encoded data occupy space close to the information-theoretic lower bound. Bit vectors and trees are the notable succinct data structures. Nevertheless, there were few attempts to apply the idea of succinct data structures to represent the semi-structured documents in space-efficient manner. In this dissertation we propose a unified, space-efficient representation of various semi-structured document formats. The core functionality of this representation is its compactness and query-ability derived from enriched functions of succinct data structures. Incorporation of (a) bit indexed arrays, (b) succinct ordinal trees, and (c) compression techniques engineers the compact representation. We implement this representation in practice, and show by experiments that construction of this representation decreases the disk usage by up to 60% while occupying 90% less RAM. We also allow processing a document in partial manner, to allow processing of larger corpus of big data even in the constrained environment. In parallel to establishing the aforementioned compact semi-structured document representation, we provide and reinforce some of the existing compression schemes in this dissertation. We first suggest an idea to encode an array of integers that is not necessarily sorted. This compaction scheme improves upon the existing universal code systems, by assistance of succinct bit vector structure. We show that our suggested algorithm reduces space usage by up to 44% while consuming 15% less time than the original code system, while the algorithm additionally supports random access of elements upon the encoded array. We also reinforce the SBH bitmap index compression algorithm. The main strength of this scheme is the use of intermediate super-bucket during operations, giving better performance on querying through a combination of compressed bitmap indexes. Inspired from splits done during the intermediate process of the SBH algorithm, we give an improved compression mechanism supporting parallelism that could be utilized in both CPUs and GPUs. We show by experiments that this CPU parallel processing optimization diminishes compression and decompression times by up to 38% in a 4-core machine without modifying the bitmap compressed form. For GPUs, the new algorithm gives 48% faster query processing time in the experiments, compared to the previous existing bitmap index compression schemes.์…€ ์ˆ˜ ์—†๋Š” ๋น… ๋ฐ์ดํ„ฐ๊ฐ€ ๋‹ค์–‘ํ•œ ์›๋ณธ๋กœ๋ถ€ํ„ฐ ์ƒ์„ฑ๋˜๊ณ  ์žˆ๋‹ค. ์ด๋“ค ๋ฐ์ดํ„ฐ์˜ ๋Œ€๋ถ€๋ถ„์€ ๊ณ ์ •๋˜์ง€ ์•Š์€ ์ข…๋ฅ˜์˜ ์Šคํ‚ค๋งˆ๋ฅผ ํฌํ•จํ•œ ํŒŒ์ผ ํ˜•ํƒœ๋กœ ์ €์žฅ๋˜๋Š”๋ฐ, ์ด๋กœ ์ธํ•˜์—ฌ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ˜•์‹์„ ์ด์šฉํ•˜์—ฌ ํŒŒ์ผ์„ ์œ ์ง€ํ•˜๋Š” ๊ฒƒ์ด ์ ํ•ฉํ•˜๋‹ค. XML, JSON ๋ฐ YAML๊ณผ ๊ฐ™์€ ์ข…๋ฅ˜์˜ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ˜•์‹์ด ๋ฐ์ดํ„ฐ์— ๋‚ด์žฌํ•˜๋Š” ๊ตฌ์กฐ๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ œ์•ˆ๋˜์—ˆ๋‹ค. ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌ์กฐํ™”ํ•˜๋Š” RDF์™€ ๊ฐ™์€ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ๋ชจ๋ธ๋“ค์€ ์‚ฌํ›„ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์ €์žฅ ๋ฐ ์ „์†ก์„ ์œ„ํ•˜์—ฌ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ˜•์‹์— ์˜์กดํ•œ๋‹ค. ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ˜•์‹์€ ๊ฐ€๋…์„ฑ๊ณผ ๋‹ค๋ณ€์„ฑ์— ์ง‘์ค‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์„œ๋ฅผ ๊ตฌ์กฐํ™”ํ•˜๊ณ  ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ถ”๊ฐ€์ ์ธ ๊ณต๊ฐ„์„ ํ•„์š”๋กœ ํ•œ๋‹ค. ๋ฌธ์„œ๋ฅผ ์••์ถ•์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ์••์ถ• ๊ธฐ๋ฒ•๋“ค์ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์œผ๋‚˜, ์ด๋“ค ๊ธฐ๋ฒ•๋“ค์„ ์ ์šฉํ•˜๊ฒŒ ๋˜๋ฉด ๋ฌธ์„œ์˜ ๋‚ด๋ถ€ ๊ตฌ์กฐ์˜ ์†์‹ค๋กœ ์ธํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์‚ฌํ›„ ์ฒ˜๋ฆฌ๊ฐ€ ์–ด๋ ต๊ฒŒ ๋œ๋‹ค. ๋ฐ์ดํ„ฐ๋ฅผ ์ •๋ณด์ด๋ก ์  ํ•˜ํ•œ์— ๊ฐ€๊นŒ์šด ๊ณต๊ฐ„๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋ฉด์„œ ์งˆ์˜์— ๋Œ€ํ•œ ์‘๋‹ต์„ ์ œ๊ณตํ•˜๋Š” ๊ฐ„๊ฒฐํ•œ ์ž๋ฃŒ๊ตฌ์กฐ๋Š” ์ด๋ก ์ ์œผ๋กœ ๋„๋ฆฌ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ๋Š” ๋ถ„์•ผ์ด๋‹ค. ๋น„ํŠธ์—ด๊ณผ ํŠธ๋ฆฌ๊ฐ€ ๋„๋ฆฌ ์•Œ๋ ค์ง„ ๊ฐ„๊ฒฐํ•œ ์ž๋ฃŒ๊ตฌ์กฐ๋“ค์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ๋“ค์„ ์ €์žฅํ•˜๋Š” ๋ฐ ๊ฐ„๊ฒฐํ•œ ์ž๋ฃŒ๊ตฌ์กฐ์˜ ์•„์ด๋””์–ด๋ฅผ ์ ์šฉํ•œ ์—ฐ๊ตฌ๋Š” ๊ฑฐ์˜ ์ง„ํ–‰๋˜์ง€ ์•Š์•˜๋‹ค. ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์„ ํ†ตํ•ด ์šฐ๋ฆฌ๋Š” ๋‹ค์–‘ํ•œ ์ข…๋ฅ˜์˜ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ˜•์‹์„ ํ†ต์ผ๋˜๊ฒŒ ํ‘œํ˜„ํ•˜๋Š” ๊ณต๊ฐ„ ํšจ์œจ์  ํ‘œํ˜„๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์ด ๊ธฐ๋ฒ•์˜ ์ฃผ์š”ํ•œ ๊ธฐ๋Šฅ์€ ๊ฐ„๊ฒฐํ•œ ์ž๋ฃŒ๊ตฌ์กฐ๊ฐ€ ๊ฐ•์ ์œผ๋กœ ๊ฐ€์ง€๋Š” ํŠน์„ฑ์— ๊ธฐ๋ฐ˜ํ•œ ๊ฐ„๊ฒฐ์„ฑ๊ณผ ์งˆ์˜ ๊ฐ€๋Šฅ์„ฑ์ด๋‹ค. ๋น„ํŠธ์—ด๋กœ ์ธ๋ฑ์‹ฑ๋œ ๋ฐฐ์—ด, ๊ฐ„๊ฒฐํ•œ ์ˆœ์„œ ์žˆ๋Š” ํŠธ๋ฆฌ ๋ฐ ๋‹ค์–‘ํ•œ ์••์ถ• ๊ธฐ๋ฒ•์„ ํ†ตํ•ฉํ•˜์—ฌ ํ•ด๋‹น ํ‘œํ˜„๋ฒ•์„ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ์ด ๊ธฐ๋ฒ•์€ ์‹ค์žฌ์ ์œผ๋กœ ๊ตฌํ˜„๋˜์—ˆ๊ณ , ์‹คํ—˜์„ ํ†ตํ•˜์—ฌ ์ด ๊ธฐ๋ฒ•์„ ์ ์šฉํ•œ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ๋“ค์€ ์ตœ๋Œ€ 60% ์ ์€ ๋””์Šคํฌ ๊ณต๊ฐ„๊ณผ 90% ์ ์€ ๋ฉ”๋ชจ๋ฆฌ ๊ณต๊ฐ„์„ ํ†ตํ•ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ธ๋‹ค. ๋”๋ถˆ์–ด ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ๋“ค์€ ๋ถ„ํ• ์ ์œผ๋กœ ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•จ์„ ๋ณด์ด๊ณ , ์ด๋ฅผ ํ†ตํ•˜์—ฌ ์ œํ•œ๋œ ํ™˜๊ฒฝ์—์„œ๋„ ๋น… ๋ฐ์ดํ„ฐ๋ฅผ ํ‘œํ˜„ํ•œ ๋ฌธ์„œ๋“ค์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ธ๋‹ค. ์•ž์„œ ์–ธ๊ธ‰ํ•œ ๊ณต๊ฐ„ ํšจ์œจ์  ๋ฐ˜๊ตฌ์กฐํ™”๋œ ๋ฌธ์„œ ํ‘œํ˜„๋ฒ•์„ ๊ตฌ์ถ•ํ•จ๊ณผ ๋™์‹œ์—, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ์••์ถ• ๊ธฐ๋ฒ• ์ค‘ ์ผ๋ถ€๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๊ฐœ์„ ํ•œ๋‹ค. ์ฒซ์งธ๋กœ, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ์ •๋ ฌ ์—ฌ๋ถ€์— ๊ด€๊ณ„์—†๋Š” ์ •์ˆ˜ ๋ฐฐ์—ด์„ ๋ถ€ํ˜ธํ™”ํ•˜๋Š” ์•„์ด๋””์–ด๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์ด ๊ธฐ๋ฒ•์€ ์ด๋ฏธ ์กด์žฌํ•˜๋Š” ๋ฒ”์šฉ ์ฝ”๋“œ ์‹œ์Šคํ…œ์„ ๊ฐœ์„ ํ•œ ํ˜•ํƒœ๋กœ, ๊ฐ„๊ฒฐํ•œ ๋น„ํŠธ์—ด ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ์ด์šฉํ•œ๋‹ค. ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๊ธฐ์กด ๋ฒ”์šฉ ์ฝ”๋“œ ์‹œ์Šคํ…œ์— ๋น„ํ•ด ์ตœ๋Œ€ 44\% ์ ์€ ๊ณต๊ฐ„์„ ์‚ฌ์šฉํ•  ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ 15\% ์ ์€ ๋ถ€ํ˜ธํ™” ์‹œ๊ฐ„์„ ํ•„์š”๋กœ ํ•˜๋ฉฐ, ๊ธฐ์กด ์‹œ์Šคํ…œ์—์„œ ์ œ๊ณตํ•˜์ง€ ์•Š๋Š” ๋ถ€ํ˜ธํ™”๋œ ๋ฐฐ์—ด์—์„œ์˜ ์ž„์˜ ์ ‘๊ทผ์„ ์ง€์›ํ•œ๋‹ค. ๋˜ํ•œ ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ๋Š” ๋น„ํŠธ๋งต ์ธ๋ฑ์Šค ์••์ถ•์— ์‚ฌ์šฉ๋˜๋Š” SBH ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ฐœ์„ ์‹œํ‚จ๋‹ค. ํ•ด๋‹น ๊ธฐ๋ฒ•์˜ ์ฃผ๋œ ๊ฐ•์ ์€ ๋ถ€ํ˜ธํ™”์™€ ๋ณตํ˜ธํ™” ์ง„ํ–‰ ์‹œ ์ค‘๊ฐ„ ๋งค๊ฐœ์ธ ์Šˆํผ๋ฒ„์ผ“์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์—ฌ๋Ÿฌ ์••์ถ•๋œ ๋น„ํŠธ๋งต ์ธ๋ฑ์Šค์— ๋Œ€ํ•œ ์งˆ์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. ์œ„ ์••์ถ• ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ค‘๊ฐ„ ๊ณผ์ •์—์„œ ์ง„ํ–‰๋˜๋Š” ๋ถ„ํ• ์—์„œ ์˜๊ฐ์„ ์–ป์–ด, ๋ณธ ํ•™์œ„๋…ผ๋ฌธ์—์„œ CPU ๋ฐ GPU์— ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๊ฐœ์„ ๋œ ๋ณ‘๋ ฌํ™” ์••์ถ• ๋งค์ปค๋‹ˆ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด CPU ๋ณ‘๋ ฌ ์ตœ์ ํ™”๊ฐ€ ์ด๋ฃจ์–ด์ง„ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์••์ถ•๋œ ํ˜•ํƒœ์˜ ๋ณ€ํ˜• ์—†์ด 4์ฝ”์–ด ์ปดํ“จํ„ฐ์—์„œ ์ตœ๋Œ€ 38\%์˜ ์••์ถ• ๋ฐ ํ•ด์ œ ์‹œ๊ฐ„์„ ๊ฐ์†Œ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์ธ๋‹ค. GPU ๋ณ‘๋ ฌ ์ตœ์ ํ™”๋Š” ๊ธฐ์กด์— ์กด์žฌํ•˜๋Š” GPU ๋น„ํŠธ๋งต ์••์ถ• ๊ธฐ๋ฒ•์— ๋น„ํ•ด 48\% ๋น ๋ฅธ ์งˆ์˜ ์ฒ˜๋ฆฌ ์‹œ๊ฐ„์„ ํ•„์š”๋กœ ํ•จ์„ ํ™•์ธํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Contribution 3 1.2 Organization 5 Chapter 2 Background 6 2.1 Model of Computation 6 2.2 Succinct Data Structures 7 Chapter 3 Space-efficient Representation of Integer Arrays 9 3.1 Introduction 9 3.2 Preliminaries 10 3.2.1 Universal Code System 10 3.2.2 Bit Vector 13 3.3 Algorithm Description 13 3.3.1 Main Principle 14 3.3.2 Optimization in the Implementation 16 3.4 Experimental Results 16 Chapter 4 Space-efficient Parallel Compressed Bitmap Index Processing 19 4.1 Introduction 19 4.2 Related Work 23 4.2.1 Byte-aligned Bitmap Code (BBC) 24 4.2.2 Word-Aligned Hybrid (WAH) 27 4.2.3 WAH-derived Algorithms 28 4.2.4 GPU-based WAH Algorithms 31 4.2.5 Super Byte-aligned Hybrid (SBH) 33 4.3 Parallelizing SBH 38 4.3.1 CPU Parallelism 38 4.3.2 GPU Parallelism 39 4.4 Experimental Results 40 4.4.1 Plain Version 41 4.4.2 Parallelized Version 46 4.4.3 Summary 49 Chapter 5 Space-efficient Representation of Semi-structured Document Formats 50 5.1 Preliminaries 50 5.1.1 Semi-structured Document Formats 50 5.1.2 Resource Description Framework 57 5.1.3 Succinct Ordinal Tree Representations 60 5.1.4 String Compression Schemes 64 5.2 Representation 66 5.2.1 Bit String Indexed Array 67 5.2.2 Main Structure 68 5.2.3 Single Document as a Collection of Chunks 72 5.2.4 Supporting Queries 73 5.3 Experimental Results 75 5.3.1 Datasets 76 5.3.2 Construction Time 78 5.3.3 RAM Usage during Construction 80 5.3.4 Disk Usage and Serialization Time 83 5.3.5 Chunk Division 83 5.3.6 String Compression 88 5.3.7 Query Time 89 Chapter 6 Conclusion 94 Bibliography 96 ์š”์•ฝ 109 Acknowledgements 111Docto

    Adaptive Lightweight Compression Acceleration on Hybrid CPU-FPGA System

    Get PDF

    An ontology enhanced parallel SVM for scalable spam filter training

    Get PDF
    This is the post-print version of the final paper published in Neurocomputing. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.Spam, under a variety of shapes and forms, continues to inflict increased damage. Varying approaches including Support Vector Machine (SVM) techniques have been proposed for spam filter training and classification. However, SVM training is a computationally intensive process. This paper presents a MapReduce based parallel SVM algorithm for scalable spam filter training. By distributing, processing and optimizing the subsets of the training data across multiple participating computer nodes, the parallel SVM reduces the training time significantly. Ontology semantics are employed to minimize the impact of accuracy degradation when distributing the training data among a number of SVM classifiers. Experimental results show that ontology based augmentation improves the accuracy level of the parallel SVM beyond the original sequential counterpart

    Mining a Small Medical Data Set by Integrating the Decision Tree and t-test

    Get PDF
    [[abstract]]Although several researchers have used statistical methods to prove that aspiration followed by the injection of 95% ethanol left in situ (retention) is an effective treatment for ovarian endometriomas, very few discuss the different conditions that could generate different recovery rates for the patients. Therefore, this study adopts the statistical method and decision tree techniques together to analyze the postoperative status of ovarian endometriosis patients under different conditions. Since our collected data set is small, containing only 212 records, we use all of these data as the training data. Therefore, instead of using a resultant tree to generate rules directly, we use the value of each node as a cut point to generate all possible rules from the tree first. Then, using t-test, we verify the rules to discover some useful description rules after all possible rules from the tree have been generated. Experimental results show that our approach can find some new interesting knowledge about recurrent ovarian endometriomas under different conditions.[[journaltype]]ๅœ‹ๅค–[[incitationindex]]EI[[booktype]]็ด™ๆœฌ[[countrycodes]]FI

    Just-in-time Analytics Over Heterogeneous Data and Hardware

    Get PDF
    Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors รข CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case
    • โ€ฆ
    corecore