203 research outputs found

    Understanding and Optimizing Flash-based Key-value Systems in Data Centers

    Get PDF
    Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

    WLFC: Write Less in Flash-based Cache

    Full text link
    Flash-based disk caches, for example Bcache and Flashcache, has gained tremendous popularity in industry in the last decade because of its low energy consumption, non-volatile nature and high I/O speed. But these cache systems have a worse write performance than the read performance because of the asymmetric I/O costs and the the internal GC mechanism. In addition to the performance issues, since the NAND flash is a type of EEPROM device, the lifespan is also limited by the Program/Erase (P/E) cycles. So how to improve the performance and the lifespan of flash-based caches in write-intensive scenarios has always been a hot issue. Benefiting from Open-Channel SSDs (OCSSDs), we propose a write-friendly flash-based disk cache system, which is called WLFC (Write Less in the Flash-based Cache). In WLFC, a strictly sequential writing method is used to minimize the write amplification. A new replacement algorithm for the write buffer is designed to minimize the erase count caused by the evicting. And a new data layout strategy is designed to minimize the metadata size persisted in SSDs. As a result, the Over-Provisioned (OP) space is completely removed, the erase count of the flash is greatly reduced, and the metadata size is 1/10 or less than that in BCache. Even with a small amount of metadata, the data consistency after the crash is still guaranteed. Compared with the existing mechanism, WLFC brings a 7%-80% reduction in write latency, a 1.07*-4.5* increment in write throughput, and a 50%-88.9% reduction in erase count, with a moderate overhead in read performance

    데이터 집약적 μ‘μš©μ˜ 효율적인 μ‹œμŠ€ν…œ μžμ› ν™œμš©μ„ μœ„ν•œ λ©”λͺ¨λ¦¬ μ„œλΈŒμ‹œμŠ€ν…œ μ΅œμ ν™”

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2020. 8. μ—Όν—Œμ˜.With explosive data growth, data-intensive applications, such as relational database and key-value storage, have been increasingly popular in a variety of domains in recent years. To meet the growing performance demands of data-intensive applications, it is crucial to efficiently and fully utilize memory resources for the best possible performance. However, general-purpose operating systems (OSs) are designed to provide system resources to applications running on a system in a fair manner at system-level. A single application may find it difficult to fully exploit the systems best performance due to this system-level fairness. For performance reasons, many data-intensive applications implement their own mechanisms that OSs already provide, under the assumption that they know better about the data than OSs. They can be greedily optimized for performance but this may result in inefficient use of system resources. In this dissertation, we claim that simple OS support with minor application modifications can yield even higher application performance without sacrificing system-level resource utilization. We optimize and extend OS memory subsystem for better supporting applications while addressing three memory-related issues in data-intensive applications. First, we introduce a memory-efficient cooperative caching approach between application and kernel buffer to address double caching problem where the same data resides in multiple layers. Second, we present a memory-efficient, transparent zero-copy read I/O scheme to avoid the performance interference problem caused by memory copy behavior during I/O. Third, we propose a memory-efficient fork-based checkpointing mechanism for in-memory database systems to mitigate the memory footprint problem of the existing fork-based checkpointing scheme; memory usage increases incrementally (up to 2x) during checkpointing for update-intensive workloads. To show the effectiveness of our approach, we implement and evaluate our schemes on real multi-core systems. The experimental results demonstrate that our cooperative approach can more effectively address the above issues related to data-intensive applications than existing non-cooperative approaches while delivering better performance (in terms of transaction processing speed, I/O throughput, or memory footprint).졜근 폭발적인 데이터 μ„±μž₯κ³Ό λ”λΆˆμ–΄ λ°μ΄ν„°λ² μ΄μŠ€, ν‚€-λ°Έλ₯˜ μŠ€ν† λ¦¬μ§€ λ“±μ˜ 데이터 집약적인 μ‘μš©λ“€μ΄ λ‹€μ–‘ν•œ λ„λ©”μΈμ—μ„œ 인기λ₯Ό μ–»κ³  μžˆλ‹€. 데이터 집약적인 μ‘μš©μ˜ 높은 μ„±λŠ₯ μš”κ΅¬λ₯Ό μΆ©μ‘±ν•˜κΈ° μœ„ν•΄μ„œλŠ” 주어진 λ©”λͺ¨λ¦¬ μžμ›μ„ 효율적이고 μ™„λ²½ν•˜κ²Œ ν™œμš©ν•˜λŠ” 것이 μ€‘μš”ν•˜λ‹€. κ·ΈλŸ¬λ‚˜, λ²”μš© 운영체제(OS)λŠ” μ‹œμŠ€ν…œμ—μ„œ μˆ˜ν–‰ 쀑인 λͺ¨λ“  μ‘μš©λ“€μ— λŒ€ν•΄ μ‹œμŠ€ν…œ μ°¨μ›μ—μ„œ κ³΅ν‰ν•˜κ²Œ μžμ›μ„ μ œκ³΅ν•˜λŠ” 것을 μš°μ„ ν•˜λ„λ‘ μ„€κ³„λ˜μ–΄μžˆλ‹€. 즉, μ‹œμŠ€ν…œ μ°¨μ›μ˜ 곡평성 μœ μ§€λ₯Ό μœ„ν•œ 운영체제 μ§€μ›μ˜ ν•œκ³„λ‘œ 인해 단일 μ‘μš©μ€ μ‹œμŠ€ν…œμ˜ 졜고 μ„±λŠ₯을 μ™„μ „νžˆ ν™œμš©ν•˜κΈ° μ–΄λ ΅λ‹€. μ΄λŸ¬ν•œ 이유둜, λ§Žμ€ 데이터 집약적 μ‘μš©μ€ μš΄μ˜μ²΄μ œμ—μ„œ μ œκ³΅ν•˜λŠ” κΈ°λŠ₯에 μ˜μ§€ν•˜μ§€ μ•Šκ³  λΉ„μŠ·ν•œ κΈ°λŠ₯을 μ‘μš© λ ˆλ²¨μ— κ΅¬ν˜„ν•˜κ³€ ν•œλ‹€. μ΄λŸ¬ν•œ μ ‘κ·Ό 방법은 νƒμš•μ μΈ μ΅œμ ν™”κ°€ κ°€λŠ₯ν•˜λ‹€λŠ” μ μ—μ„œ μ„±λŠ₯ 상 이득이 μžˆμ„ 수 μžˆμ§€λ§Œ, μ‹œμŠ€ν…œ μžμ›μ˜ λΉ„νš¨μœ¨μ μΈ μ‚¬μš©μ„ μ΄ˆλž˜ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 운영체제의 지원과 μ•½κ°„μ˜ μ‘μš© μˆ˜μ •λ§ŒμœΌλ‘œλ„ λΉ„νš¨μœ¨μ μΈ μ‹œμŠ€ν…œ μžμ› μ‚¬μš© 없이 보닀 높은 μ‘μš© μ„±λŠ₯을 보일 수 μžˆμŒμ„ 증λͺ…ν•˜κ³ μž ν•œλ‹€. 그러기 μœ„ν•΄ 운영체제의 λ©”λͺ¨λ¦¬ μ„œλΈŒμ‹œμŠ€ν…œμ„ μ΅œμ ν™” 및 ν™•μž₯ν•˜μ—¬ 데이터 집약적인 μ‘μš©μ—μ„œ λ°œμƒν•˜λŠ” μ„Έ 가지 λ©”λͺ¨λ¦¬ κ΄€λ ¨ 문제λ₯Ό ν•΄κ²°ν•˜μ˜€λ‹€. 첫째, λ™μΌν•œ 데이터가 μ—¬λŸ¬ 계측에 μ‘΄μž¬ν•˜λŠ” 쀑볡 캐싱 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ μ‘μš©κ³Ό 컀널 버퍼 간에 λ©”λͺ¨λ¦¬ 효율적인 ν˜‘λ ₯ 캐싱 방식을 μ œμ‹œν•˜μ˜€λ‹€. λ‘˜μ§Έ, μž…μΆœλ ₯μ‹œ λ°œμƒν•˜λŠ” λ©”λͺ¨λ¦¬ λ³΅μ‚¬λ‘œ μΈν•œ μ„±λŠ₯ κ°„μ„­ 문제λ₯Ό ν”Όν•˜κΈ° μœ„ν•΄ λ©”λͺ¨λ¦¬ 효율적인 무볡사 읽기 μž…μΆœλ ₯ 방식을 μ œμ‹œν•˜μ˜€λ‹€. μ…‹μ§Έ, 인-λ©”λͺ¨λ¦¬ λ°μ΄ν„°λ² μ΄μŠ€ μ‹œμŠ€ν…œμ„ μœ„ν•œ λ©”λͺ¨λ¦¬ 효율적인 fork 기반 체크포인트 기법을 μ œμ•ˆν•˜μ—¬ κΈ°μ‘΄ 포크 기반 체크포인트 κΈ°λ²•μ—μ„œ λ°œμƒν•˜λŠ” λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ 증가 문제λ₯Ό μ™„ν™”ν•˜μ˜€λ‹€; κΈ°μ‘΄ 방식은 μ—…λ°μ΄νŠΈ 집약적 μ›Œν¬λ‘œλ“œμ— λŒ€ν•΄ μ²΄ν¬ν¬μΈνŒ…μ„ μˆ˜ν–‰ν•˜λŠ” λ™μ•ˆ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ μ΅œλŒ€ 2λ°°κΉŒμ§€ μ μ§„μ μœΌλ‘œ 증가할 수 μžˆμ—ˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ œμ•ˆν•œ λ°©λ²•λ“€μ˜ 효과λ₯Ό 증λͺ…ν•˜κΈ° μœ„ν•΄ μ‹€μ œ λ©€ν‹° μ½”μ–΄ μ‹œμŠ€ν…œμ— κ΅¬ν˜„ν•˜κ³  κ·Έ μ„±λŠ₯을 ν‰κ°€ν•˜μ˜€λ‹€. μ‹€ν—˜κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•œ ν˜‘λ ₯적 접근방식이 기쑴의 λΉ„ν˜‘λ ₯적 접근방식보닀 데이터 집약적 μ‘μš©μ—κ²Œ 효율적인 λ©”λͺ¨λ¦¬ μžμ› ν™œμš©μ„ κ°€λŠ₯ν•˜κ²Œ ν•¨μœΌλ‘œμ¨ 더 높은 μ„±λŠ₯을 μ œκ³΅ν•  수 μžˆμŒμ„ 확인할 수 μžˆμ—ˆλ‹€.Chapter 1 Introduction 1 1.1 Motivation 1 1.1.1 Importance of Memory Resources 1 1.1.2 Problems 2 1.2 Contributions 5 1.3 Outline 6 Chapter 2 Background 7 2.1 Linux Kernel Memory Management 7 2.1.1 Page Cache 7 2.1.2 Page Reclamation 8 2.1.3 Page Table and TLB Shootdown 9 2.1.4 Copy-on-Write 10 2.2 Linux Support for Applications 11 2.2.1 fork 11 2.2.2 madvise 11 2.2.3 Direct I/O 12 2.2.4 mmap 13 Chapter 3 Memory Efficient Cooperative Caching 14 3.1 Motivation 14 3.1.1 Problems of Existing Datastore Architecture 14 3.1.2 Proposed Architecture 17 3.2 Related Work 17 3.3 Design and Implementation 19 3.3.1 Overview 19 3.3.2 Kernel Support 24 3.3.3 Migration to DBIO 25 3.4 Evaluation 27 3.4.1 System Configuration 27 3.4.2 Methodology 28 3.4.3 TPC-C Benchmarks 30 3.4.4 YCSB Benchmarks 32 3.5 Summary 37 Chapter 4 Memory Efficient Zero-copy I/O 38 4.1 Motivation 38 4.1.1 The Problems of Copy-Based I/O 38 4.2 Related Work 40 4.2.1 Zero Copy I/O 40 4.2.2 TLB Shootdown 42 4.2.3 Copy-on-Write 43 4.3 Design and Implementation 44 4.3.1 Prerequisites for z-READ 44 4.3.2 Overview of z-READ 45 4.3.3 TLB Shootdown Optimization 48 4.3.4 Copy-on-Write Optimization 52 4.3.5 Implementation 55 4.4 Evaluation 55 4.4.1 System Configurations 56 4.4.2 Effectiveness of the TLB Shootdown Optimization 57 4.4.3 Effectiveness of CoW Optimization 59 4.4.4 Analysis of the Performance Improvement 62 4.4.5 Performance Interference Intensity 63 4.4.6 Effectiveness of z-READ in Macrobenchmarks 65 4.5 Summary 67 Chapter 5 Memory Efficient Fork-based Checkpointing 69 5.1 Motivation 69 5.1.1 Fork-based Checkpointing 69 5.1.2 Approach 71 5.2 Related Work 73 5.3 Design and Implementation 74 5.3.1 Overview 74 5.3.2 OS Support 78 5.3.3 Implementation 79 5.4 Evaluation 80 5.4.1 Experimental Setup 80 5.4.2 Performance 81 5.5 Summary 86 Chapter 6 Conclusion 87 μš”μ•½ 100Docto

    Overview of Caching Mechanisms to Improve Hadoop Performance

    Full text link
    Nowadays distributed computing environments, large amounts of data are generated from different resources with a high velocity, rendering the data difficult to capture, manage, and process within existing relational databases. Hadoop is a tool to store and process large datasets in a parallel manner across a cluster of machines in a distributed environment. Hadoop brings many benefits like flexibility, scalability, and high fault tolerance; however, it faces some challenges in terms of data access time, I/O operation, and duplicate computations resulting in extra overhead, resource wastage, and poor performance. Many researchers have utilized caching mechanisms to tackle these challenges. For example, they have presented approaches to improve data access time, enhance data locality rate, remove repetitive calculations, reduce the number of I/O operations, decrease the job execution time, and increase resource efficiency. In the current study, we provide a comprehensive overview of caching strategies to improve Hadoop performance. Additionally, a novel classification is introduced based on cache utilization. Using this classification, we analyze the impact on Hadoop performance and discuss the advantages and disadvantages of each group. Finally, a novel hybrid approach called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental results show that our hybrid method achieves an average improvement of 31.2% in job execution time

    FlashX: Massive Data Analysis Using Fast I/O

    Get PDF
    With the explosion of data and the increasing complexity of data analysis, large-scale data analysis imposes significant challenges in systems design. While current research focuses on scaling out to large clusters, these scale-out solutions introduce a significant amount of overhead. This thesis is motivated by the advance of new I/O technologies such as flash memory. Instead of scaling out, we explore efficient system designs in a single commodity machine with non-uniform memory architecture (NUMA) and scale to large datasets by utilizing commodity solid-state drives (SSDs). This thesis explores the impact of the new I/O technologies on large-scale data analysis. Instead of implementing individual data analysis algorithms for SSDs, we develop a data analysis ecosystem called FlashX to target a large range of data analysis tasks. FlashX includes three subsystems: SAFS, FlashGraph and FlashMatrix. SAFS is a user-space filesystem optimized for a large SSD array to deliver maximal I/O throughput from SSDs. FlashGraph is a general-purpose graph analysis framework that processes graphs in a semi-external memory fashion, i.e., keeping vertex state in memory and edges on SSDs, and scales to graphs with billions of vertices by utilizing SSDs through SAFS. FlashMatrix is a matrix-oriented programming framework that supports both sparse matrices and dense matrices for general data analysis. Similar to FlashGraph, it scales matrix operations beyond memory capacity by utilizing SSDs. We demonstrate that with the current I/O technologies FlashGraph and FlashMatrix in the (semi-)external-memory meets or even exceeds state-of-the-art in-memory data analysis frameworks while scaling to massive datasets for a large variety of data analysis tasks
    • …
    corecore