26 research outputs found

    LSM-tree based Database System Optimization using Application-Driven Flash Management

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2019. 8. ์—ผํ—Œ์˜.Modern data centers aim to take advantage of high parallelism in storage de- vices for I/O intensive applications such as storage servers, cache systems, and key-value stores. Key-value stores are the most typical applications that should provide a highly reliable service with high-performance. To increase the I/O performance of key-value stores, many data centers have actively adopted next- generation storage devices such as Non-Volatile Memory Express (NVMe) based Solid State Devices (SSDs). NVMe SSDs and its protocol are characterized to provide a high degree of parallelism. However, they may not guarantee pre- dictable performance while providing high performance and parallelism. For example, heavily mixed read and write requests can result in performance degra- dation of throughput and response time due to the interference between the requests and internal operations (e.g., Garbage Collection (GC)). To minimize the interference and provide higher performance, this paper presents IsoKV, an isolation scheme for key-value stores by exploiting internal parallelism in SSDs. IsoKV manages the level of parallelism of SSD directly by running application-driven flash management scheme. By storing data with dif- ferent characteristics in each dedicated internal parallel units of SSD, IsoKV re- duces interference between I/O requests. We implement IsoKV on RocksDB and evaluate it using Open-Channel SSD. Our extensive experiments have shown that IsoKV improves overall throughput and response time on average 1.20ร— and 43% compared with the existing scheme, respectively.์ตœ์‹  ๋ฐ์ดํ„ฐ ์„ผํ„ฐ๋Š” ์Šคํ† ๋ฆฌ์ง€ ์„œ๋ฒ„, ์บ์‹œ ์‹œ์Šคํ…œ ๋ฐ Key-Value stores์™€ ๊ฐ™์€ I/O ์ง‘์•ฝ์ ์ธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ์Šคํ† ๋ฆฌ์ง€ ์žฅ์น˜์˜ ๋†’์€ ๋ณ‘๋ ฌ์„ฑ์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•œ๋‹ค. Key-value stores๋Š” ๊ณ ์„ฑ๋Šฅ์˜ ๊ณ ์‹ ๋ขฐ ์„œ๋น„์Šค๋ฅผ ์ œ๊ณตํ•ด์•ผ ํ•˜๋Š” ๊ฐ€์žฅ ๋Œ€ํ‘œ์ ์ธ ์‘์šฉํ”„๋กœ๊ทธ๋žจ์ด๋‹ค. Key-value stores์˜ I/O ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ๋งŽ์€ ๋ฐ ์ดํ„ฐ ์„ผํ„ฐ๊ฐ€ ๋น„ํœ˜๋ฐœ์„ฑ ๋ฉ”๋ชจ๋ฆฌ ์ต์Šคํ”„๋ ˆ์Šค(NVMe) ๊ธฐ๋ฐ˜ SSD(Solid State Devices) ์™€ ๊ฐ™์€ ์ฐจ์„ธ๋Œ€ ์Šคํ† ๋ฆฌ์ง€ ์žฅ์น˜๋ฅผ ์ ๊ทน์ ์œผ๋กœ ์ฑ„ํƒํ•˜๊ณ  ์žˆ๋‹ค. NVMe SSD์™€ ๊ทธ ํ”„ ๋กœํ† ์ฝœ์€ ๋†’์€ ์ˆ˜์ค€์˜ ๋ณ‘๋ ฌ์„ฑ์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์ด ํŠน์ง•์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ NVMe SSD๊ฐ€ ๋ณ‘๋ ฌ์„ฑ์„ ์ œ๊ณตํ•˜๋ฉด์„œ๋„ ์˜ˆ์ธก ๊ฐ€๋Šฅํ•œ ์„ฑ๋Šฅ์„ ๋ณด์žฅํ•˜์ง€๋Š” ๋ชปํ•  ์ˆ˜ ์žˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ์ฝ๊ธฐ ๋ฐ ์“ฐ๊ธฐ ์š”์ฒญ์ด ๋งŽ์ด ํ˜ผํ•ฉ๋˜๋ฉด ์š”์ฒญ๊ณผ ๋‚ด๋ถ€ ์ž‘์—…(์˜ˆ: GC) ์‚ฌ์ด์˜ ๊ฐ„์„ญ์œผ๋กœ ์ธํ•ด ์ฒ˜๋ฆฌ๋Ÿ‰ ๋ฐ ์‘๋‹ต ์‹œ๊ฐ„์˜ ์„ฑ๋Šฅ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฐ„์„ญ์„ ์ตœ์†Œํ™”ํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” Key-value stores๋ฅผ ์œ„ํ•œ ๊ฒฉ๋ฆฌ ๋ฐฉ์‹์ธ IsoKV๋ฅผ ์ œ์‹œํ•œ๋‹ค. IsoKV๋Š” ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์ค‘์‹ฌ ํ”Œ๋ž˜์‹œ ์ €์žฅ์žฅ ์น˜ ๊ด€๋ฆฌ ๋ฐฉ์‹์„ ํ†ตํ•ด SSD์˜ ๋ณ‘๋ ฌํ™” ์ˆ˜์ค€์„ ์ง์ ‘ ๊ด€๋ฆฌํ•œ๋‹ค. IsoKV๋Š” SSD์˜ ๊ฐ ์ „์šฉ ๋‚ด๋ถ€ ๋ณ‘๋ ฌ ์žฅ์น˜์— ์„œ๋กœ ๋‹ค๋ฅธ ํŠน์„ฑ์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•จ์œผ๋กœ์จ I/O ์š”์ฒญ ๊ฐ„์˜ ๊ฐ„์„ญ์„ ์ค„์ธ๋‹ค. ๋˜ํ•œ IsoKV๋Š” SSD์˜ LSM ํŠธ๋ฆฌ ๋กœ์ง๊ณผ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ๋ฅผ ๋™๊ธฐํ™”ํ•˜ ์—ฌ GC๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” RocksDB๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ IsoKV๋ฅผ ๊ตฌํ˜„ํ•˜์˜€์œผ๋ฉฐ, Open-Channel SSD๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ฑ๋Šฅํ‰๊ฐ€ํ•˜์˜€๋‹ค.. ๋ณธ ์—ฐ๊ตฌ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ์— ๋”ฐ๋ฅด๋ฉด IsoKV๋Š” ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ์ €์žฅ ๋ฐฉ์‹๊ณผ ๋น„๊ตํ•˜์—ฌ ํ‰๊ท  1.20ร— ๋น ๋ฅด๊ณ  ๋ฐ 43% ๊ฐ์†Œ๋œ ์ฒ˜๋ฆฌ๋Ÿ‰๊ณผ ์‘๋‹ต์‹œ๊ฐ„ ์„ฑ๋Šฅ ๊ฐœ์„  ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ๊ด€์ ์—์„œ 43% ๊ฐ์†Œํ•˜์˜€๋‹ค.Abstract Introduction 1 Background 8 Log-Structured Merge tree based Database 8 Open-Channel SSDs 9 Preliminary Experimental Evaluation using oc bench 10 Design and Implementation 14 Overview of IsoKV 14 GC-free flash storage management synchronized with LSM-tree logic 15 I/O type Isolation through Application-Driven Flash Management 17 Dynamic Arrangement of NAND-Flash Parallelism 19 Implementation 21 Evaluation 23 Experimental Setup 23 Performance Evaluation 25 Related Work 31 Conclusion 34 Bibliography 35 ์ดˆ๋ก 40Maste

    Performance Characterization of NVMe Flash Devices with Zoned Namespaces (ZNS)

    Get PDF
    The recent emergence of NVMe flash devices with Zoned Namespace support, ZNS SSDs, represents a significant new advancement in flash storage. ZNS SSDs introduce a new storage abstraction of append-only zones with a set of new I/O (i.e., append) and management (zone state machine transition) commands. With the new abstraction and commands, ZNS SSDs offer more control to the host software stack than a non-zoned SSD for flash management, which is known to be complex (because of garbage collection, scheduling, block allocation, parallelism management, overprovisioning). ZNS SSDs are, consequently, gaining adoption in a variety of applications (e.g., file systems, key-value stores, and databases), particularly latency-sensitive big-data applications. Despite this enthusiasm, there has yet to be a systematic characterization of ZNS SSD performance with its zoned storage model abstractions and I/O operations. This work addresses this crucial shortcoming. We report on the performance features of a commercially available ZNS SSD (13 key observations), explain how these features can be incorporated into publicly available state-of-the-art ZNS emulators, and recommend guidelines for ZNS SSD application developers. All artifacts (code and data sets) of this study are publicly available at https://github.com/stonet-research/NVMeBenchmarks

    Operating System Support for High-Performance Solid State Drives

    Get PDF

    TACKLING PERFORMANCE AND SECURITY ISSUES FOR CLOUD STORAGE SYSTEMS

    Get PDF
    Building data-intensive applications and emerging computing paradigm (e.g., Machine Learning (ML), Artificial Intelligence (AI), Internet of Things (IoT) in cloud computing environments is becoming a norm, given the many advantages in scalability, reliability, security and performance. However, under rapid changes in applications, system middleware and underlying storage device, service providers are facing new challenges to deliver performance and security isolation in the context of shared resources among multiple tenants. The gap between the decades-old storage abstraction and modern storage device keeps widening, calling for software/hardware co-designs to approach more effective performance and security protocols. This dissertation rethinks the storage subsystem from device-level to system-level and proposes new designs at different levels to tackle performance and security issues for cloud storage systems. In the first part, we present an event-based SSD (Solid State Drive) simulator that models modern protocols, firmware and storage backend in detail. The proposed simulator can capture the nuances of SSD internal states under various I/O workloads, which help researchers understand the impact of various SSD designs and workload characteristics on end-to-end performance. In the second part, we study the security challenges of shared in-storage computing infrastructures. Many cloud providers offer isolation at multiple levels to secure data and instance, however, security measures in emerging in-storage computing infrastructures are not studied. We first investigate the attacks that could be conducted by offloaded in-storage programs in a multi-tenancy cloud environment. To defend against these attacks, we build a lightweight Trusted Execution Environment, IceClave to enable security isolation between in-storage programs and internal flash management functions. We show that while enforcing security isolation in the SSD controller with minimal hardware cost, IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.4x better performance than the conventional host-based trusted computing approach. In the third part, we investigate the performance interference problem caused by other tenants' I/O flows. We demonstrate that I/O resource sharing can often lead to performance degradation and instability. The block device abstraction fails to expose SSD parallelism and pass application requirements. To this end, we propose a software/hardware co-design to enforce performance isolation by bridging the semantic gap. Our design can significantly improve QoS (Quality of Service) by reducing throughput penalties and tail latency spikes. Lastly, we explore more effective I/O control to address contention in the storage software stack. We illustrate that the state-of-the-art resource control mechanism, Linux cgroups is insufficient for controlling I/O resources. Inappropriate cgroup configurations may even hurt the performance of co-located workloads under memory intensive scenarios. We add kernel support for limiting page cache usage per cgroup and achieving I/O proportionality

    Exploiting solid state drive parallelism for real-time flash storage

    Full text link
    The increased volume of sensor data generated by emerging applications in areas such as autonomous vehicles requires new technologies for storage and retrieval. NAND flash memory has desirable characteristics for real-time information storage and retrieval, such as non-volatility, shock resistance, low power consumption and fast access time. However, NAND flash memory management suffers high tail latency during storage space reclamation. This is unacceptable in a real-time system, where missed deadlines can have potentially catastrophic consequences. Current methods to ensure timing guarantees in flash storage do not explicitly exploit the internal parallelism in Solid State Drives (SSDs). Modern SSDs are able to support massive amounts of parallelism, as evidenced by the shift from the Advanced Host Controller Interface (AHCI) to the Non-Volatile Memory Host Controller Interface (NVMe), a multi-queue interface. This thesis focuses on providing predictable, low-latency guarantees for read and write requests in NAND flash memory by exploiting the internal parallelism in SSDs. The first part of the thesis presents a partitioned flash design that dynamically assigns each parallel flash unit to perform either reads or writes. To access data from a flash unit that is busy servicing a write request or performing garbage collection, the device rebuilds the data using encoding. Consequently, reads are never blocked by writes or storage space reclamation. In this design, however, low read latency is achieved at the expense of write throughput. The second part of the thesis explores how to predictably improve performance by minimizing the garbage collection cost in flash storage. The root cause of this extra cost is due to the SSDโ€™s inability to accurately determine data lifetime and group together data that expires before space needs to be reclaimed. This is exacerbated by the narrow block I/O interface, which prevents optimizations from either the device or the application above. By sharing application-specific knowledge of data lifetime with the device, the SSD is able to efficiently lay out data such that garbage collection cost is minimized

    Synergistically Coupling Of Solid State Drives And Hard Disks For Qos-Aware Virtual Memory

    Get PDF
    With significant advantages in capacity, power consumption, and price, solid state disk (SSD) has good potential to be employed as an extension of dynamic random-access memory, such that applications with large working sets could run efficiently on a modestly configured system. While initial results reported in recent works show promising prospects for this use of SSD by incorporating it into the management of virtual memory, frequent writes from write-intensive programs could quickly wear out SSD, making the idea less practical. This thesis makes four contributions towards solving this issue. First, we propose a scheme, HybridSwap, that integrates a hard disk with an SSD for virtual memory man-agement, synergistically achieving the advantages of both. In addition, HybridSwap can constrain performance loss caused by swapping according to user-specified QoS requirements. Second, We develop an efficient algorithm to record memory access history and to identify page access sequences and evaluate their locality. Using a history of page access patterns HybridSwap dynamically creates an out-of-memory virtual memory page layout on the swap space spanning the SSD and hard disk such that random reads are served by SSD and sequential reads are asynchronously served by the hard disk with high efficiency. Third, we build a QoS-assurance mechanism into HybridSwap to demonstrate the flexibility of the system in bounding the performance penalty due to swapping. It allows users to specify a bound on the program stall time due to page faults as a percentage of the program\u27s total run time. Forth, we have implemented HybridSwap in a recent Linux kernel, version 2.6.35.7. Our evaluation with representative benchmarks, such as Memcached for key-value store, and scientific programs from the ALGLIB cross-platform numerical analysis and data processing library, shows that the number of writes to SSD can be reduced by 40% with the system\u27s performance comparable to that with pure SSD swapping, and can satisfy a swapping-related QoS requirement as long as the I/O resource is sufficient

    Understanding and Improving the Performance of Read Operations Across the Storage Stack

    Get PDF
    We live in a data-driven era, large amounts of data are generated and collected every day. Storage systems are the backbone of this era, as they store and retrieve data. To cope with increasing data demands (e.g., diversity, scalability), storage systems are experiencing changes across the stack. As other computer systems, storage systems rely on layering and modularity, to allow rapid development. Unfortunately, this can hinder performance clarity and introduce degradations (e.g., tail latency), due to unexpected interactions between components of the stack. In this thesis, we first perform a study to understand the behavior across different layers of the storage stack. We focus on sequential read workloads, a common I/O pattern in distributed le systems (e.g., HDFS, GFS). We analyze the interaction between read workloads, local le systems (i.e., ext4), and storage media (i.e., SSDs). We perform the same experiment over different periods of time (e.g., le lifetime). We uncover 3 slowdowns, all of which occur in the lower layers. When combined, these slowdowns can degrade throughput by 30%. We find that increased parallelism on the local le system mitigates these slowdowns, showing the need for adaptability in storage stacks. Given the fact that performance instabilities can occur at any layer of the stack, it is important that upper-layer systems are able to react. We propose smart hedging, a novel technique to manage high-percentile (tail) latency variations in read operations. Smart hedging considers production challenges, such as massive scalability, heterogeneity, and ease of deployment and maintainability. Our technique establishes a dynamic threshold by tracking latencies on the client-side. If a read operation exceeds the threshold, a new hedged request is issued, in an exponential back-off manner. We implement our technique in HDFS and evaluate it on 70k servers in 3 datacenters. Our technique reduces average tail latency, without generating excessive system load
    corecore