25 research outputs found

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Ssd Flash Drives Used to Improve Performance with Clarity Data Warehouse

    Get PDF
    Since the introduction of solid-state devices (SSD), both storage area network (SAN) administrators and database administrators (DBA) have imagined the performance gains promised by replacing hard disk drives (HDD). The initial testing in the laboratory did not promise those gains in the real world. The SSD vendors worked between 2007 and 2010 to improve performance, which in industry standard tests showed steady progress. Despite the gains in the laboratory, there were few examples of real world usage particularly in the field of data warehousing. The process of extracting, transforming and loading (ETL) places extreme loads on the ability of the storage device to update data. This paper studies the effect on one such data warehouse

    HetFS: A heterogeneous file system for everyone

    Get PDF
    Storage devices have been getting more and more diverse during the last decade. The advent of SSDs made it painfully clear that rotating devices, such as HDDs or magnetic tapes, were lacking in regards to response time. However, SSDs currently have a limited number of write cycles and a significantly larger price per capacity, which has prevented rotational technologies from begin abandoned. Additionally, Non-Volatile Memories (NVMs) have been lately gaining traction, offering devices that typically outperform NAND-based SSDs but exhibit a full new set of idiosyncrasies. Therefore, in order to appropriately support this diversity, intelligent mechanisms will be needed in the near-future to balance the benefits and drawbacks of each storage technology available to a system. In this paper, we present a first step towards such a mechanism called HetFS, an extension to the ZFS file system that is capable of choosing the storage device a file should be kept in according to preprogrammed filters. We introduce the prototype and show some preliminary results of the effects obtained when placing specific files into different devices.The research leading to these results has received funding from the European Community under the BIGStorage ETN (Project 642963 of the H2020-MSCA-ITN-2014), by the Spanish Ministry of Economy and Competitiveness under the TIN2015-65316 grant and by the Catalan Government under the 2014-SGR- 1051 grant. To learn more about the BigStorage project, please visit http: //bigstorage-project.eu/.Peer ReviewedPostprint (author's final draft

    Efficient Cross-Device Query Processing

    Get PDF
    The increasing diversity of hardware within a single system promises large performance gains but also poses a challenge for data management systems. Strategies for the efficient use of hardware with large performance differences are still lacking. For example, existing research on GPU supported data management largely handles the GPU in isolation from the system’s CPU — The GPU is considered the central processor and the CPU used only to mitigate the GPU’s weaknesses where necessary. To make efficient use of all available devices, we developed a processing strategy that lets unequal devices like GPU and CPU combine their strengths rather than work in isolation. To this end, we decompose relational data into individual bits and place the resulting partitions on the appropriate devices. Operations are processed in phases, each phase executed on one device. This way, we achieve significant performance gains and good load distribution among the available devices in a limited real-life use case. To grow this idea into a generic system, we identify challenges as well as potential hardware configurations and applications that can benefit from this approach

    Second-tier Cache Management to Support DBMS Workloads

    Get PDF
    Enterprise Database Management Systems (DBMS) often run on computers with dedicated storage systems. Their data access requests need to go through two tiers of cache, i.e., a database bufferpool and a storage server cache, before reaching the storage media, e.g., disk platters. A tremendous amount of work has been done to improve the performance of the first-tier cache, i.e., the database bufferpool. However, the amount of work focusing on second-tier cache management to support DBMS workloads is comparably small. In this thesis we propose several novel techniques for managing second-tier caches to boost DBMS performance in terms of query throughput and query response time. The main purpose of second-tier cache management is to reduce the I/O latency endured by database query executions. This goal can be achieved by minimizing the number of reads and writes issued from second-tier caches to storage devices. The rst part of our research focuses on reducing the number of read I/Os issued by second-tier caches. We observe that DBMSs issue I/O requests for various reasons. The rationales behind these I/O requests provide useful information to second-tier caches because they can be used to estimate the temporal locality of the data blocks being requested. A second-tier cache can exploit this information when making replacement decisions. In this thesis we propose a technique to pass this information from DBMSs to second-tier caches and to use it in guiding cache replacements. The second part of this thesis focuses on reducing the number of writes issued by second-tier caches. Our work is two fold. First, we observe that although there are second-tier caches within computer systems, today's DBMS cannot take full advantage of them. For example, most commercial DBMSs use forced writes to propagate bufferpool updates to permanent storage for data durability reasons. We notice that enforcing such a practice is more conservative than necessary. Some of the writes can be issued as unforced requests and can be cached in the second-tier cache without immediate synchronization. This will give the second-tier cache opportunities to cache and consolidate multiple writes into one request. However, unfortunately, the current POSIX compliant le system interfaces provided by mainstream operating systems e.g., Unix and Windows) are not flexible enough to support such dynamic synchronization. We propose to extend such interfaces to let DBMSs take advantage of using unforced writes whenever possible. Additionally, we observe that the existing cache replacement algorithms are designed solely to maximize read cache hits (i.e., to minimize read I/Os). The purpose is to minimize the read latency, which is on the critical path of query executions. We argue that minimizing read requests is not the only objective of cache replacement. When I/O bandwidth becomes a bottleneck the objective should be to minimize the total number of I/Os, including both reads and writes, to achieve the best performance. We propose to associate a new type of replacement cost, i.e., the total number of I/Os caused by the replacement, with each cache page; and we also present a partial characterization of an optimal algorithm which minimizes the total number of I/Os generated by caches. Based on this knowledge, we extend several existing replacement algorithms, which are write-oblivious (focus only on reducing reads), to be write-aware and observe promising performance gains in the evaluations
    corecore