6 research outputs found

    Optimizing Virtual Machine I/O Performance in Cloud Environments

    Get PDF
    Maintaining closeness between data sources and data consumers is crucial for workload I/O performance. In cloud environments, this kind of closeness can be violated by system administrative events and storage architecture barriers. VM migration events are frequent in cloud environments. VM migration changes VM runtime inter-connection or cache contexts, significantly degrading VM I/O performance. Virtualization is the backbone of cloud platforms. I/O virtualization adds additional hops to workload data access path, prolonging I/O latencies. I/O virtualization overheads cap the throughput of high-speed storage devices and imposes high CPU utilizations and energy consumptions to cloud infrastructures. To maintain the closeness between data sources and workloads during VM migration, we propose Clique, an affinity-aware migration scheduling policy, to minimize the aggregate wide area communication traffic during storage migration in virtual cluster contexts. In host-side caching contexts, we propose Successor to recognize warm pages and prefetch them into caches of destination hosts before migration completion. To bypass the I/O virtualization barriers, we propose VIP, an adaptive I/O prefetching framework, which utilizes a virtual I/O front-end buffer for prefetching so as to avoid the on-demand involvement of I/O virtualization stacks and accelerate the I/O response. Analysis on the traffic trace of a virtual cluster containing 68 VMs demonstrates that Clique can reduce inter-cloud traffic by up to 40%. Tests of MPI Reduce_scatter benchmark show that Clique can keep VM performance during migration up to 75% of the non-migration scenario, which is more than 3 times of the Random VM choosing policy. In host-side caching environments, Successor performs better than existing cache warm-up solutions and achieves zero VM-perceived cache warm-up time with low resource costs. At system level, we conducted comprehensive quantitative analysis on I/O virtualization overheads. Our trace replay based simulation demonstrates the effectiveness of VIP for data prefetching with ignorable additional cache resource costs

    Optimization of a data buffer for a hardwired prefetcher in a PCRAM controller buffer

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2021. 2. 이혁재.본 논문에서는 PCRAM based storage의 cache buffer 성능 향상을 위한 prefetcher 구조에 대한 연구를 수행한다. Hdd, Nand-SSD를 위한 일반적인 software prefetcher가 아닌 PCRAM hard-wired controller를 위한 경량화된 hardware prefetching 구조를 제안함으로서 NVM stroage cache buffer의 최적화를 수행한다. 이 때 hardware prefetcher의 implementation시 단점으로 부각되는 history buffer의 area overhead, prefetching algorithm의 hardware complexity를 개선하기 위해 application id polling과 address boundary detector를 이용한 필터 구성으로 history buffer를 경량화하였다. application id poller는 단위 시점에서의 populated application id를 선정하고 해당 appication id의 cache miss address만을 sequential boundary detector로 인가한다. Sequential boundary detector는 miss address의 sequentiality를 detect하여 history buffer에 기록하고 이를 바탕으로 유형별 prefetch request를 생성한다 Real-life storage workload로 controller의 average latency를 측정하였고, 약 14%의 read latency개선으로 동일 성능 필요 cache buffer size의 50%만을 필요케끔 cache size가 최적화 됨을 확인하였다In this paper, we study the prefetcher structure to improve the cache buffer performance of PCRAM based storage. We perform optimization of the NVM stroage cache buffer by proposing a lightweight hardware prefetching structure for PCRAM hard-wired controller, not a general software prefetcher for HDD and Nand-SSD. At this time, in order to improve the area overhead of the history buffer and hardware complexity of the prefetching algorithm, which is a drawback when implementing the hardware prefetcher, the history buffer is lightened by configuring a filter using application id polling and sequential address boundary detector. The application id poller selects the populated application id at the unit time point and applies only the cache miss address of the application id to the sequential boundary detector. Sequential boundary detector detects the sequentiality of miss addresses, records them in the history buffer, and creates prefetch requests for each type based on this. The average latency of the controller was measured with a real-life storage workload, and it was confirmed that the cache size was optimized so that only 50% of the cache buffer size required the same performance by improving the read latency of about 14%목 차 제 1 장 서 론 1 1.1 연구 배경 및 목표 1 1.2 논문의 구성 2 제 2 장 관련 연구 3 2.1 Prefetching 3 2.1.1 Hardware Prefetching 3 2.1.2 Prefetching for Storage 5 2.2 Previous Prefetching Work 6 2.2.1 ReadAhead 6 2.2.2 SPAN-Prefetching for NVM 7 제 3 장 Application ID polling과 Sequential Boundary Detector를 이용한 hardware prefetching 방식 제안 8 3.1 기존 연구의 문제점 9 3.2 관측 10 3.3 Prefetching 구조 제안 . 12 3.3.1 Application ID Polling 14 3.3.2 Sequential Boundary Detector 19 3.3.3 History Buffer 24 3.3.4 Overall implementation 25 제 4 장 실험 결과 및 분석 29 4.1 실험 구현 방식 29 4.2 제안 방식에 따른 결과 32 제 5 장 결론 36 참고 문헌 37 Abstract 39Maste

    Mitigating Interference During Virtual Machine Live Migration through Storage Offloading

    Get PDF
    Today\u27s cloud landscape has evolved computing infrastructure into a dynamic, high utilization, service-oriented paradigm. This shift has enabled the commoditization of large-scale storage and distributed computation, allowing engineers to tackle previously untenable problems without large upfront investment. A key enabler of flexibility in the cloud is the ability to transfer running virtual machines across subnets or even datacenters using live migration. However, live migration can be a costly process, one that has the potential to interfere with other applications not involved with the migration. This work investigates storage interference through experimentation with real-world systems and well-established benchmarks. In order to address migration interference in general, a buffering technique is presented that offloads the migration\u27s read, eliminating interference in the majority of scenarios

    Design and Implementation of a Data Prefetching Scheme for Hybrid Storage

    No full text
    混合式硬碟是由傳統硬碟和NAND Flash固態硬碟所組成的,主要擷取兩種儲存裝置的優勢,傳統硬碟提供大量的儲存空間,固態硬碟的存取速度較快,用來當作傳統硬碟的快取,可以提升整體的I/O效能。Linux Readahead針對傳統硬碟設計,並無法區分資料來自於哪一塊硬碟,不適用於混合式硬碟,對於讀取的幫助不大。因此本研究希望分析出資料來源,針對所讀取的硬碟特性來設計演算法,改善I/O效能。本篇論文利用Flashcache在Linux Ubuntu 12.04上架構一個混合式硬碟,接著設計適用於混合式硬碟的Adaptive Prefetch演算法,分析存取的循序度,根據這個程度來設定適當的Prefetch Size,並且加入Interleaved Requests的處理機制,提升整體I/O效能。實驗中測試了各種不同的 Request Size,配合不同Sequential、Random access來驗證本篇論文所提到的方法。實驗結果可以發現本篇提出的Adaptive Prefetch機制對比原始的Flashcache、Linux Readahead,都有明顯的效能改善,提升I/O Throughput,縮短Average Response Time。第一章 緒論 1 1.1 簡介 1 1.2 研究動機 2 1.3 研究貢獻 3 1.4 論文架構 4 第二章 背景知識與相關研究 5 2.1 The Block I/O layer 5 2.1.1 The bio structure 7 2.1.2 Bio_vec 7 2.2混合式硬碟存取架構 8 2.2.1 Device-Mapper Kernel Architecture 8 2.2.2存取的資料流向(IO FLOW) 11 2.2.3目標驅動程式(Target Driver) 12 2.3 Flashcache 13 2.3.1 Set associative hash 14 2.3.2 Device Mapper Layer 15 2.3.3 Metadata Overhead 15 2.3.4 Replacement Policy 15 2.3.5 Read Request Handler 16 2.3.6 Metadata 16 2.3.7 Superblock 16 2.3.8 Dirty block 16 2.3.9 Non-cacheable 17 2.4 Prefetching in Linux 17 2.4.1 Readahead windows 19 2.4.2 呼叫時機 call convention 20 2.4.3 循序性Sequentiality 21 2.4.4 Readahead Sequence 22 2.4.5 Readahead 的缺點 24 2.5 Prefetching之相關研究 24 2.5.1 預先讀取資料的替換方法 26 2.5.2 AMP Algorithm 27 2.5.3 Flashy Prefetching for High-Performance Flash Drives 28 2.5.3.1 Challenge 28 2.5.3.2 The Architecture of Flashy Prefetching 30 2.5.3.2.1 Trace Collection 30 2.5.3.2.2 Pattern Reconition 30 2.5.3.2.3 Block Prefetching 31 2.5.3.2.4 Feedback Monitor 32 第三章 系統設計與實作 33 3.1 系統架構 33 3.2 Prefetching handler 34 3.2.1 Prefetch_D handler 34 3.2.2 Prefetch_DS handler 35 3.2.3 Prefetch_S Handler 35 3.3 傳統硬碟的prefetching: Prefetch_D 35 3.3.1 Bio size < cache block size的處理機制 36 3.2.2 Bio size=cache block size處理機制 36 3.3.3 Prefetch stop 38 3.4 固態硬碟的prefetch: Prefetch_S 38 3.5 Interleaved Issue 39 3.6 Overall Flow 41 第四章 實驗結果與討論 43 4.1 環境設定 43 4.2 實驗結果 46 4.2.1. 不同Request Size的Prefetch效能比較 46 4.2.3 不同讀寫比例測試 52 4.2.3.1 Write 50% Throughput 52 4.2.3.2 Write 50% Average Response Time 55 4.2.3.3 Write 20% Throughput 57 4.2.3.4 Write 20% Average Response Time 59 第五章 結論及未來工作 62 參考文獻 6
    corecore