203 research outputs found
Understanding and Optimizing Flash-based Key-value Systems in Data Centers
Flash-based key-value systems are widely deployed in todayβs data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations.
In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs
WLFC: Write Less in Flash-based Cache
Flash-based disk caches, for example Bcache and Flashcache, has gained
tremendous popularity in industry in the last decade because of its low energy
consumption, non-volatile nature and high I/O speed. But these cache systems
have a worse write performance than the read performance because of the
asymmetric I/O costs and the the internal GC mechanism. In addition to the
performance issues, since the NAND flash is a type of EEPROM device, the
lifespan is also limited by the Program/Erase (P/E) cycles. So how to improve
the performance and the lifespan of flash-based caches in write-intensive
scenarios has always been a hot issue. Benefiting from Open-Channel SSDs
(OCSSDs), we propose a write-friendly flash-based disk cache system, which is
called WLFC (Write Less in the Flash-based Cache). In WLFC, a strictly
sequential writing method is used to minimize the write amplification. A new
replacement algorithm for the write buffer is designed to minimize the erase
count caused by the evicting. And a new data layout strategy is designed to
minimize the metadata size persisted in SSDs. As a result, the Over-Provisioned
(OP) space is completely removed, the erase count of the flash is greatly
reduced, and the metadata size is 1/10 or less than that in BCache. Even with a
small amount of metadata, the data consistency after the crash is still
guaranteed. Compared with the existing mechanism, WLFC brings a 7%-80%
reduction in write latency, a 1.07*-4.5* increment in write throughput, and a
50%-88.9% reduction in erase count, with a moderate overhead in read
performance
λ°μ΄ν° μ§μ½μ μμ©μ ν¨μ¨μ μΈ μμ€ν μμ νμ©μ μν λ©λͺ¨λ¦¬ μλΈμμ€ν μ΅μ ν
νμλ
Όλ¬Έ (λ°μ¬) -- μμΈλνκ΅ λνμ : 곡과λν μ κΈ°Β·μ»΄ν¨ν°κ³΅νλΆ, 2020. 8. μΌνμ.With explosive data growth, data-intensive applications, such as relational database and key-value storage, have been increasingly popular in a variety of domains in recent years. To meet the growing performance demands of data-intensive applications, it is crucial to efficiently and fully utilize memory resources for the best possible performance.
However, general-purpose operating systems (OSs) are designed to provide system resources to applications running on a system in a fair manner at system-level. A single application may find it difficult to fully exploit the systems best performance due to this system-level fairness. For performance reasons, many data-intensive applications implement their own mechanisms that OSs already provide, under the assumption that they know better about the data than OSs. They can be greedily optimized for performance but this may result in inefficient use of system resources.
In this dissertation, we claim that simple OS support with minor application modifications can yield even higher application performance without sacrificing system-level resource utilization. We optimize and extend OS memory subsystem for better supporting applications while addressing three memory-related issues in data-intensive applications. First, we introduce a memory-efficient cooperative caching approach between application and kernel buffer to address double caching problem where the same data resides in multiple layers. Second, we present a memory-efficient, transparent zero-copy read I/O scheme to avoid the performance interference problem caused by memory copy behavior during I/O. Third, we propose a memory-efficient fork-based checkpointing mechanism for in-memory database systems to mitigate the memory footprint problem of the existing fork-based checkpointing scheme; memory usage increases incrementally (up to 2x) during checkpointing for update-intensive workloads.
To show the effectiveness of our approach, we implement and evaluate our schemes on real multi-core systems. The experimental results demonstrate that our cooperative approach can more effectively address the above issues related to data-intensive applications than existing non-cooperative approaches while delivering better performance (in terms of transaction processing speed, I/O throughput, or memory footprint).μ΅κ·Ό νλ°μ μΈ λ°μ΄ν° μ±μ₯κ³Ό λλΆμ΄ λ°μ΄ν°λ² μ΄μ€, ν€-λ°Έλ₯ μ€ν λ¦¬μ§ λ±μ λ°μ΄ν° μ§μ½μ μΈ μμ©λ€μ΄ λ€μν λλ©μΈμμ μΈκΈ°λ₯Ό μ»κ³ μλ€. λ°μ΄ν° μ§μ½μ μΈ μμ©μ λμ μ±λ₯ μꡬλ₯Ό μΆ©μ‘±νκΈ° μν΄μλ μ£Όμ΄μ§ λ©λͺ¨λ¦¬ μμμ ν¨μ¨μ μ΄κ³ μλ²½νκ² νμ©νλ κ²μ΄ μ€μνλ€. κ·Έλ¬λ, λ²μ© μ΄μ체μ (OS)λ μμ€ν
μμ μν μ€μΈ λͺ¨λ μμ©λ€μ λν΄ μμ€ν
μ°¨μμμ 곡ννκ² μμμ μ 곡νλ κ²μ μ°μ νλλ‘ μ€κ³λμ΄μλ€. μ¦, μμ€ν
μ°¨μμ 곡νμ± μ μ§λ₯Ό μν μ΄μ체μ μ§μμ νκ³λ‘ μΈν΄ λ¨μΌ μμ©μ μμ€ν
μ μ΅κ³ μ±λ₯μ μμ ν νμ©νκΈ° μ΄λ ΅λ€. μ΄λ¬ν μ΄μ λ‘, λ§μ λ°μ΄ν° μ§μ½μ μμ©μ μ΄μ체μ μμ μ 곡νλ κΈ°λ₯μ μμ§νμ§ μκ³ λΉμ·ν κΈ°λ₯μ μμ© λ 벨μ ꡬννκ³€ νλ€. μ΄λ¬ν μ κ·Ό λ°©λ²μ νμμ μΈ μ΅μ νκ° κ°λ₯νλ€λ μ μμ μ±λ₯ μ μ΄λμ΄ μμ μ μμ§λ§, μμ€ν
μμμ λΉν¨μ¨μ μΈ μ¬μ©μ μ΄λν μ μλ€.
λ³Έ λ
Όλ¬Έμμλ μ΄μ체μ μ μ§μκ³Ό μ½κ°μ μμ© μμ λ§μΌλ‘λ λΉν¨μ¨μ μΈ μμ€ν
μμ μ¬μ© μμ΄ λ³΄λ€ λμ μμ© μ±λ₯μ λ³΄μΌ μ μμμ μ¦λͺ
νκ³ μ νλ€. κ·Έλ¬κΈ° μν΄ μ΄μ체μ μ λ©λͺ¨λ¦¬ μλΈμμ€ν
μ μ΅μ ν λ° νμ₯νμ¬ λ°μ΄ν° μ§μ½μ μΈ μμ©μμ λ°μνλ μΈ κ°μ§ λ©λͺ¨λ¦¬ κ΄λ ¨ λ¬Έμ λ₯Ό ν΄κ²°νμλ€. 첫째, λμΌν λ°μ΄ν°κ° μ¬λ¬ κ³μΈ΅μ μ‘΄μ¬νλ μ€λ³΅ μΊμ± λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ μμ©κ³Ό 컀λ λ²νΌ κ°μ λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ νλ ₯ μΊμ± λ°©μμ μ μνμλ€. λμ§Έ, μ
μΆλ ₯μ λ°μνλ λ©λͺ¨λ¦¬ 볡μ¬λ‘ μΈν μ±λ₯ κ°μ λ¬Έμ λ₯Ό νΌνκΈ° μν΄ λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ λ¬΄λ³΅μ¬ μ½κΈ° μ
μΆλ ₯ λ°©μμ μ μνμλ€. μ
μ§Έ, μΈ-λ©λͺ¨λ¦¬ λ°μ΄ν°λ² μ΄μ€ μμ€ν
μ μν λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ fork κΈ°λ° μ²΄ν¬ν¬μΈνΈ κΈ°λ²μ μ μνμ¬ κΈ°μ‘΄ ν¬ν¬ κΈ°λ° μ²΄ν¬ν¬μΈνΈ κΈ°λ²μμ λ°μνλ λ©λͺ¨λ¦¬ μ¬μ©λ μ¦κ° λ¬Έμ λ₯Ό μννμλ€; κΈ°μ‘΄ λ°©μμ μ
λ°μ΄νΈ μ§μ½μ μν¬λ‘λμ λν΄ μ²΄ν¬ν¬μΈν
μ μννλ λμ λ©λͺ¨λ¦¬ μ¬μ©λμ΄ μ΅λ 2λ°°κΉμ§ μ μ§μ μΌλ‘ μ¦κ°ν μ μμλ€.
λ³Έ λ
Όλ¬Έμμλ μ μν λ°©λ²λ€μ ν¨κ³Όλ₯Ό μ¦λͺ
νκΈ° μν΄ μ€μ λ©ν° μ½μ΄ μμ€ν
μ ꡬννκ³ κ·Έ μ±λ₯μ νκ°νμλ€. μ€νκ²°κ³Όλ₯Ό ν΅ν΄ μ μν νλ ₯μ μ κ·Όλ°©μμ΄ κΈ°μ‘΄μ λΉνλ ₯μ μ κ·Όλ°©μλ³΄λ€ λ°μ΄ν° μ§μ½μ μμ©μκ² ν¨μ¨μ μΈ λ©λͺ¨λ¦¬ μμ νμ©μ
κ°λ₯νκ² ν¨μΌλ‘μ¨ λ λμ μ±λ₯μ μ 곡ν μ μμμ νμΈν μ μμλ€.Chapter 1 Introduction 1
1.1 Motivation 1
1.1.1 Importance of Memory Resources 1
1.1.2 Problems 2
1.2 Contributions 5
1.3 Outline 6
Chapter 2 Background 7
2.1 Linux Kernel Memory Management 7
2.1.1 Page Cache 7
2.1.2 Page Reclamation 8
2.1.3 Page Table and TLB Shootdown 9
2.1.4 Copy-on-Write 10
2.2 Linux Support for Applications 11
2.2.1 fork 11
2.2.2 madvise 11
2.2.3 Direct I/O 12
2.2.4 mmap 13
Chapter 3 Memory Efficient Cooperative Caching 14
3.1 Motivation 14
3.1.1 Problems of Existing Datastore Architecture 14
3.1.2 Proposed Architecture 17
3.2 Related Work 17
3.3 Design and Implementation 19
3.3.1 Overview 19
3.3.2 Kernel Support 24
3.3.3 Migration to DBIO 25
3.4 Evaluation 27
3.4.1 System Configuration 27
3.4.2 Methodology 28
3.4.3 TPC-C Benchmarks 30
3.4.4 YCSB Benchmarks 32
3.5 Summary 37
Chapter 4 Memory Efficient Zero-copy I/O 38
4.1 Motivation 38
4.1.1 The Problems of Copy-Based I/O 38
4.2 Related Work 40
4.2.1 Zero Copy I/O 40
4.2.2 TLB Shootdown 42
4.2.3 Copy-on-Write 43
4.3 Design and Implementation 44
4.3.1 Prerequisites for z-READ 44
4.3.2 Overview of z-READ 45
4.3.3 TLB Shootdown Optimization 48
4.3.4 Copy-on-Write Optimization 52
4.3.5 Implementation 55
4.4 Evaluation 55
4.4.1 System Configurations 56
4.4.2 Effectiveness of the TLB Shootdown Optimization 57
4.4.3 Effectiveness of CoW Optimization 59
4.4.4 Analysis of the Performance Improvement 62
4.4.5 Performance Interference Intensity 63
4.4.6 Effectiveness of z-READ in Macrobenchmarks 65
4.5 Summary 67
Chapter 5 Memory Efficient Fork-based Checkpointing 69
5.1 Motivation 69
5.1.1 Fork-based Checkpointing 69
5.1.2 Approach 71
5.2 Related Work 73
5.3 Design and Implementation 74
5.3.1 Overview 74
5.3.2 OS Support 78
5.3.3 Implementation 79
5.4 Evaluation 80
5.4.1 Experimental Setup 80
5.4.2 Performance 81
5.5 Summary 86
Chapter 6 Conclusion 87
μμ½ 100Docto
Overview of Caching Mechanisms to Improve Hadoop Performance
Nowadays distributed computing environments, large amounts of data are
generated from different resources with a high velocity, rendering the data
difficult to capture, manage, and process within existing relational databases.
Hadoop is a tool to store and process large datasets in a parallel manner
across a cluster of machines in a distributed environment. Hadoop brings many
benefits like flexibility, scalability, and high fault tolerance; however, it
faces some challenges in terms of data access time, I/O operation, and
duplicate computations resulting in extra overhead, resource wastage, and poor
performance. Many researchers have utilized caching mechanisms to tackle these
challenges. For example, they have presented approaches to improve data access
time, enhance data locality rate, remove repetitive calculations, reduce the
number of I/O operations, decrease the job execution time, and increase
resource efficiency. In the current study, we provide a comprehensive overview
of caching strategies to improve Hadoop performance. Additionally, a novel
classification is introduced based on cache utilization. Using this
classification, we analyze the impact on Hadoop performance and discuss the
advantages and disadvantages of each group. Finally, a novel hybrid approach
called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods
from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental
results show that our hybrid method achieves an average improvement of 31.2% in
job execution time
FlashX: Massive Data Analysis Using Fast I/O
With the explosion of data and the increasing complexity of data analysis, large-scale
data analysis imposes significant challenges in systems design. While current
research focuses on scaling out to large clusters, these scale-out solutions introduce
a significant amount of overhead. This thesis is motivated by the advance of new
I/O technologies such as flash memory. Instead of scaling out, we explore efficient
system designs in a single commodity machine with non-uniform memory architecture
(NUMA) and scale to large datasets by utilizing commodity solid-state drives
(SSDs). This thesis explores the impact of the new I/O technologies on large-scale
data analysis. Instead of implementing individual data analysis algorithms for SSDs,
we develop a data analysis ecosystem called FlashX to target a large range of data
analysis tasks. FlashX includes three subsystems: SAFS, FlashGraph and FlashMatrix.
SAFS is a user-space filesystem optimized for a large SSD array to deliver
maximal I/O throughput from SSDs. FlashGraph is a general-purpose graph analysis
framework that processes graphs in a semi-external memory fashion, i.e., keeping
vertex state in memory and edges on SSDs, and scales to graphs with billions of
vertices by utilizing SSDs through SAFS. FlashMatrix is a matrix-oriented programming
framework that supports both sparse matrices and dense matrices for general
data analysis. Similar to FlashGraph, it scales matrix operations beyond memory
capacity by utilizing SSDs. We demonstrate that with the current I/O technologies
FlashGraph and FlashMatrix in the (semi-)external-memory meets or even exceeds
state-of-the-art in-memory data analysis frameworks while scaling to massive datasets
for a large variety of data analysis tasks
- β¦