Search CORE

31 research outputs found

Write-limited sorts and joins for persistent memory

Author: Chen S.
Kevin L.
Kim H.
Myers D.
Qureshi M. K.
Publication venue: 'VLDB Endowment'
Publication date: 01/01/2014
Field of study

To mitigate the impact of the widening gap between the memory needs of CPUs and what standard memory technology can deliver, system architects have introduced a new class of memory technology termed persistent memory. Persistent memory is byteaddressable, but exhibits asymmetric I/O: writes are typically one order of magnitude more expensive than reads. Byte addressability combined with I/O asymmetry render the performance profile of persistent memory unique. Thus, it becomes imperative to find new ways to seamlessly incorporate it into database systems. We do so in the context of query processing. We focus on the fundamental operations of sort and join processing. We introduce the notion of write-limited algorithms that effectively minimize the I/O cost. We give a high-level API that enables the system to dynamically optimize the workflow of the algorithms; or, alternatively, allows the developer to tune the write profile of the algorithms. We present four different techniques to incorporate persistent memory into the database processing stack in light of this API. We have implemented and extensively evaluated all our proposals. Our results show that the algorithms deliver on their promise of I/O-minimality and tunable performance. We showcase the merits and deficiencies of each implementation technique, thus taking a solid first step towards incorporating persistent memory into query processing. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Low energy cache memory implementation with data compression

Author: Romero De Blas Héctor
Publication venue: Universitat Politècnica de Catalunya
Publication date: 25/01/2022
Field of study

El consumo de energía en las CPUs ha alcanzado un punto en que dificulta la disipación de calor, y la alta temperatura reduce el rendimiento del procesador. Además, el dispositivo que alimenta la CPU con datos, la memoria caché, está creciendo, y cuanto más crece, más energía consume. Para solventar ese problema, esta tesis propone un nuevo diseño de caché de compresión, la 'Ghost Cache'. Esta propuesta intenta ampliar otros algoritmos de compresión de cachés para comprimir 3 valores distintos (0, 1 y -1) en 2 bits. De esta manera reducimos un banco de datos de 32 bits a uno de 8. También analizamos 5 algoritmos distintos, y los probamos en nuestra caché programada en Verilog para ver en qué casos hay mejoras en nuestra caché y en cuáles no.Energy consumption on CPUs is reaching a point where the heat is becoming hard to dissipate, and the temperature is hindering the overall performance of the processors. In addition, the device that feeds the data to the processor, the cache memory, it is getting larger, and the larger it is, the more power it uses. In order to solve the energy problem, this thesis proposes a new design of compressing cache, the 'Ghost Cache'. The proposal for the 'Ghost Cache' tries to extend another existing cache compressing algorithms in order to compress 3 possible values (0, 1 and -1) in 2 bits. Doing this, we reduced the regular 32-bit data banks of caches, into a 8-bit data bank. We analyzed 5 different algorithms, and we tested them on our cache programmed in Verilog to show which algorithms work better on the cache and which work worse

UPCommons. Portal del coneixement obert de la UPC

WiscSort: External Sorting For Byte-Addressable Storage

Author: Arpaci-Dusseau Andrea C.
Arpaci-Dusseau Remzi H.
Banakar Vinay
Keeton Kimberly
Patel Yuvraj
Wu Kan
Publication venue: 'VLDB Endowment'
Publication date: 10/07/2023
Field of study

We present WiscSort, a new approach to high-performance concurrent sorting for existing and future byte-addressable storage (BAS) devices. WiscSort carefully reduces writes, exploits random reads by splitting keys and values during sorting, and performs interference-aware scheduling with thread pool sizing to avoid I/O bandwidth degradation. We introduce the BRAID model which encompasses the unique characteristics of BAS devices. Many state-of-the-art sorting systems do not comply with the BRAID model and deliver sub-optimal performance, whereas WiscSort demonstrates the effectiveness of complying with BRAID. We show that WiscSort is 2-7x faster than competing approaches on a standard sort benchmark. We evaluate the effectiveness of key-value separation on different key-value sizes and compare our concurrency optimizations with various other concurrency models. Finally, we emulate generic BAS devices and show how our techniques perform well with various combinations of hardware properties

arXiv.org e-Print Archive

Edinburgh Research Explorer

Flashing up the storage hierarchy

Author: Koltsidas Ioannis
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

The focus of this thesis is on systems that employ both flash and magnetic disks as storage media. Considering the widely disparate I/O costs of flash disks currently on the market, our approach is a cost-aware one: we explore techniques that exploit the I/O costs of the underlying storage devices to improve I/O performance. We also study the asymmetric I/O properties of magnetic and flash disks and propose algorithms that take advantage of this asymmetry. Our work is geared towards database systems; however, most of the ideas presented in this thesis can be generalised to any data-intensive application. For the case of low-end, inexpensive flash devices with large capacities, we propose using them at the same level of the memory hierarchy as magnetic disks. In such setups, we study the problem of data placement, that is, on which type of storage medium each data page should be stored. We present a family of online algorithms that can be used to dynamically decide the optimal placement of each page. Our algorithms adapt to changing workloads for maximum I/O efficiency. We found that substantial performance benefits can be gained with such a design, especially for queries touching large sets of pages with read-intensive workloads. Moving one level higher in the storage hierarchy, we study the problem of buffer allocation in databases that store data across multiple storage devices. We present our novel approach to per-device memory allocation, under which both the I/O costs of the storage devices and the cache behaviour of the data stored on each medium determine the size of the main memory buffers that will be allocated to each device. Towards informed decisions, we found that the ability to predict the cache behaviour of devices under various cache sizes is of paramount importance. In light of this, we study the problem of efficiently tracking the hit ratio curve for each device and introduce a lowoverhead technique that provides high accuracy. The price and performance characteristics of high-end flash disks make them perfectly suitable for use as caches between the main memory and the magnetic disk(s) of a storage system. In this context, we primarily focus on the problem of deciding which data should be placed in the flash cache of a system: how the data flows from one level of the memory hierarchy to the others is crucial for the performance of such a system. Considering such decisions, we found that the I/O costs of the flash cache play a major role. We also study several implementation issues such as the optimal size of flash pages and the properties of the page directory of a flash cache. Finally, we explore sorting in external memory using external merge-sort, as the latter employs access patterns that can take full advantage of the I/O characteristics of flash memory. We study the problem of sorting hierarchical data, as such is necessary for a wide variety of applications including archiving scientific data and dealing with large XML datasets. The proposed algorithm efficiently exploits the hierarchical structure in order to minimize the number of disk accesses and optimise the utilization of available memory. Our proposals are not specific to sorting over flash memory: the presented techniques are highly efficient over magnetic disks as well

Edinburgh Research Archive

Effective Use of SSDs in Database Systems

Author: Ghodsnia Pedram
Publication venue: 'University of Waterloo'
Publication date: 03/05/2018
Field of study

With the advent of solid state drives (SSDs), the storage industry has experienced a revolutionary improvement in I/O performance. Compared to traditional hard disk drives (HDDs), SSDs benefit from shorter I/O latency, better power efficiency, and cheaper random I/Os. Because of these superior properties, SSDs are gradually replacing HDDs. For decades, database management systems have been designed, architected, and optimized based on the performance characteristics of HDDs. In order to utilize the superior performance of SSDs, new methods should be developed, some database components should be redesigned, and architectural decisions should be revisited. In this thesis, novel methods are proposed to exploit the new capabilities of modern SSDs to improve the performance of database systems. The first is a new method for using SSDs as a fully persistent second level memory buffer pool. This method uses SSDs as a supplementary storage device to improve transactional throughput and to reduce the checkpoint and recovery times. A prototype of the proposed method is compared with its closest existing competitor. The second considers the impact of the parallel I/O capability of modern SSDs on the database query optimizer. It is shown that a query optimizer that is unaware of the parallel I/O capability of SSDs can make significantly sub-optimal decisions. In addition, a practical method for making the query optimizer parallel-I/O-aware is introduced and evaluated empirically. The third technique is an SSD-friendly external merge sort. This sorting technique has better performance than other common external sorting techniques. It also improves the SSD's lifespan by reducing the number of write operations required during sorting

University of Waterloo's Institutional Repository

Recommended from our members

Compiling Irregular Software to Specialized Hardware

Author: Townsend Richard Morse
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

High-level synthesis (HLS) has simplified the design process for energy-efficient hardware accelerators: a designer specifies an accelerator’s behavior in a “high-level” language, and a toolchain synthesizes register-transfer level (RTL) code from this specification. Many HLS systems produce efficient hardware designs for regular algorithms (i.e., those with limited conditionals or regular memory access patterns), but most struggle with irregular algorithms that rely on dynamic, data-dependent memory access patterns (e.g., traversing pointer-based structures like lists, trees, or graphs). HLS tools typically provide imperative, side-effectful languages to the designer, which makes it difficult to correctly specify and optimize complex, memory-bound applications. In this dissertation, I present an alternative HLS methodology that leverages properties of functional languages to synthesize hardware for irregular algorithms. The main contribution is an optimizing compiler that translates pure functional programs into modular, parallel dataflow networks in hardware. I give an overview of this compiler, explain how its source and target together enable parallelism in the face of irregularity, and present two specific optimizations that further exploit this parallelism. Taken together, this dissertation verifies my thesis that pure functional programs exhibiting irregular memory access patterns can be compiled into specialized hardware and optimized for parallelism. This work extends the scope of modern HLS toolchains. By relying on properties of pure functional languages, our compiler can synthesize hardware from programs containing constructs that commercial HLS tools prohibit, e.g., recursive functions and dynamic memory allocation. Hardware designers may thus use our compiler in conjunction with existing HLS systems to accelerate a wider class of algorithms than before

Columbia University Academic Commons

Data Structures & Algorithm Analysis in C++

Author: Shaffer Clifford A
Publication venue: Scholars Crossing
Publication date: 01/03/2013
Field of study

This is the textbook for CSIS 215 at Liberty University.https://digitalcommons.liberty.edu/textbooks/1005/thumbnail.jp

Liberty University Digital Commons

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

Author: Bingmann Timo
Publication venue
Publication date: 01/01/2018
Field of study

This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using the induced sorting principle, and the third part distributed external memory suffix sorting with a new distributed algorithmic big data framework named Thrill.Comment: 396 pages, dissertation, Karlsruher Instituts f\"ur Technologie (2018). arXiv admin note: text overlap with arXiv:1101.3448 by other author

arXiv.org e-Print Archive

KITopen

How Downwards Causation Occurs in Digital Computers

Author: Ellis George
Publication venue
Publication date
Field of study

Digital computers carry out algorithms coded in high level programs. These abstract entities determine what happens at the physical level: they control whether electrons flow through specific transistors at specific times or not, entailing downward causation in both the logical and implementation hierarchies. This paper explores how this is possible in the light of the alleged causal completeness of physics at the bottom level, and highlights the mechanism that enables strong emergence (the manifest causal effectiveness of application programs) to occur. Although synchronic emergence of higher levels from lower levels is manifestly true, diachronic emergence is generically not the case; indeed we give specific examples where it cannot occur because of the causal effectiveness of higher level variables

PhilPapers