Search CORE

975 research outputs found

Enabling virtualization technologies for enhanced cloud computing

Author: Qazi Kashifuddin
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2015
Field of study

Cloud Computing is a ubiquitous technology that offers various services for individual users, small businesses, as well as large scale organizations. Data-center owners maintain clusters of thousands of machines and lease out resources like CPU, memory, network bandwidth, and storage to clients. For organizations, cloud computing provides the means to offload server infrastructure and obtain resources on demand, which reduces setup costs as well as maintenance overheads. For individuals, cloud computing offers platforms, resources and services that would otherwise be unavailable to them. At the core of cloud computing are various virtualization technologies and the resulting Virtual Machines (VMs). Virtualization enables cloud providers to host multiple VMs on a single Physical Machine (PM). The hallmark of VMs is the inability of the end-user to distinguish them from actual PMs. VMs allow cloud owners such essential features as live migration, which is the process of moving a VM from one PM to another while the VM is running, for various reasons. Features of the cloud such as fault tolerance, geographical server placement, energy management, resource management, big data processing, parallel computing, etc. depend heavily on virtualization technologies. Improvements and breakthroughs in these technologies directly lead to introduction of new possibilities in the cloud. This thesis identifies and proposes innovations for such underlying VM technologies and tests their performance on a cluster of 16 machines with real world benchmarks. Specifically the issues of server load prediction, VM consolidation, live migration, and memory sharing are attempted. First, a unique VM resource load prediction mechanism based on Chaos Theory is introduced that predicts server workloads with high accuracy. Based on these predictions, VMs are dynamically and autonomously relocated to different PMs in the cluster in an attempt to conserve energy. Experimental evaluations with a prototype on real world data- center load traces show that up to 80% of the unused PMs can be freed up and repurposed, with Service Level Objective (SLO) violations as little as 3%. Second, issues in live migration of VMs are analyzed, based on which a new distributed approach is presented that allows network-efficient live migration of VMs. The approach amortizes the transfer of memory pages over the life of the VM, thus reducing network traffic during critical live migration. The prototype reduces network usage by up to 45% and lowers required time by up to 40% for live migration on various real-world loads. Finally, a memory sharing and management approach called ACE-M is demonstrated that enables VMs to share and utilize all the memory available in the cluster remotely. Along with predictions on network and memory, this approach allows VMs to run applications with memory requirements much higher than physically available locally. It is experimentally shown that ACE-M reduces the memory performance degradation by about 75% and achieves a 40% lower network response time for memory intensive VMs. A combination of these innovations to the virtualization technologies can minimize performance degradation of various VM attributes, which will ultimately lead to a better end-user experience

Digital Commons @ New Jersey Institute of Technology (NJIT)

ARM Wrestling with Big Data: A Study of Commodity ARM64 Server for Big Data Workloads

Author: Kalyanasundaram Jayanth
Simmhan Yogesh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/09/2017
Field of study

ARM processors have dominated the mobile device market in the last decade due to their favorable computing to energy ratio. In this age of Cloud data centers and Big Data analytics, the focus is increasingly on power efficient processing, rather than just high throughput computing. ARM's first commodity server-grade processor is the recent AMD A1100-series processor, based on a 64-bit ARM Cortex A57 architecture. In this paper, we study the performance and energy efficiency of a server based on this ARM64 CPU, relative to a comparable server running an AMD Opteron 3300-series x64 CPU, for Big Data workloads. Specifically, we study these for Intel's HiBench suite of web, query and machine learning benchmarks on Apache Hadoop v2.7 in a pseudo-distributed setup, for data sizes up to

20GB

files,

5M

web pages and

500M

tuples. Our results show that the ARM64 server's runtime performance is comparable to the x64 server for integer-based workloads like Sort and Hive queries, and only lags behind for floating-point intensive benchmarks like PageRank, when they do not exploit data parallelism adequately. We also see that the ARM64 server takes

\frac{1}{3}^{rd}

the energy, and has an Energy Delay Product (EDP) that is

50-71\%

lower than the x64 server. These results hold promise for ARM64 data centers hosting Big Data workloads to reduce their operational costs, while opening up opportunities for further analysis.Comment: Accepted for publication in the Proceedings of the 24th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 201

arXiv.org e-Print Archive

Crossref

Open Access Repository of IISc Research Publications

Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study

Author: Boroumand Amirali
Ghose Saugata
Grignou Gwendal
Gómez-Luna Juan
Mutlu Onur
Oliveira Geraldo F.
Qazi Salman
Rao Sonny
Savery Alexis
Shiu Eric
Thakur Rahul
Publication venue
Publication date: 03/11/2021
Field of study

The number and diversity of consumer devices are growing rapidly, alongside their target applications' memory consumption. Unfortunately, DRAM scalability is becoming a limiting factor to the available memory capacity in consumer devices. As a potential solution, manufacturers have introduced emerging non-volatile memories (NVMs) into the market, which can be used to increase the memory capacity of consumer devices by augmenting or replacing DRAM. Since entirely replacing DRAM with NVM in consumer devices imposes large system integration and design challenges, recent works propose extending the total main memory space available to applications by using NVM as swap space for DRAM. However, no prior work analyzes the implications of enabling a real NVM-based swap space in real consumer devices. In this work, we provide the first analysis of the impact of extending the main memory space of consumer devices using off-the-shelf NVMs. We extensively examine system performance and energy consumption when the NVM device is used as swap space for DRAM main memory to effectively extend the main memory capacity. For our analyses, we equip real web-based Chromebook computers with the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD device. We compare the performance and energy consumption of interactive workloads running on our Chromebook with NVM-based swap space, where the Intel Optane SSD capacity is used as swap space to extend main memory capacity, against two state-of-the-art systems: (i) a baseline system with double the amount of DRAM than the system with the NVM-based swap space; and (ii) a system where the Intel Optane SSD is naively replaced with a state-of-the-art (yet slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of equivalent size as the NVM-based swap space

arXiv.org e-Print Archive

RapidSwap: 효율적인 계층형 Far Memory

Author: 김현익
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. Bernhard Egger.As computation responsibilities are transferred and migrated to cloud computing environments, cloud operators are facing more challenges to accommodate workloads provided by their customers. Modern applications typically require a massive amount of main memory. DRAM allows the robust delivery of data to processing entities in conventional node-centric architectures. However, physically expanding DRAM is impracticable due to hardware limits and cost. In this thesis, we present RapidSwap, an efficient hierarchical far memory that exploits phase-change memory (persistent memory) in data centers to present near-DRAM performance at a significantly lower total cost of ownership (TCO). RapidSwap migrates cold memory contents to slower and cheaper storage devices by exhibiting the memory access frequency of applications. Evaluated with several different real-world cloud benchmark scenarios, RapidSwap achieves a reduction of 20% in operating cost at minimal performance degradation and is 30% more cost-effective than pure DRAM solutions. RapidSwap exemplifies that sophisticated utilization of novel storage technologies can present significant TCO savings in cloud data centers.컴퓨팅 환경이 클라우드 환경을 중심으로 변화하고 있어 클라우드 제공자는 고객이 제공하는 워크로드를 수용하기 위한 다양한 문제에 직면하고 있다. 오늘날 응용 프로그램은 일반적으로 많은 양의 메인 메모리를 요구한다. 기존 노드 중심 아키텍처에서 DRAM을 사용하면 빠르게 데이터를 제공할 수 있다. 그러나, 물리적으로 DRAM을 일정 수준 이상 확장하는 것은 하드웨어 제한과 비용으로 인해 현실적으로 불가능하다. 본 논문에서는 DRAM에 가까운 성능을 제공하면서도 총 소유 비용을 상당히 낮추는 효율적 far memory인 RapidSwap을 제시하였다. RapidSwap은 데이터센터 환경에서 상변화 메모리 (phase-change memory; persistent memory)를 활용하며 어플리케이션의 메모리 접근 빈도를 추적하여 자주 접근되지 않는 메모리를 느리고 저렴한 저장장치로 이송하여 이를 달성한다. 여러 저명한 클라우드 벤치마크 시나리오로 평가한 결과, RapidSwap은 순수 DRAM 대비 약 20%의 운영 비용을 절감하며 약 30%의 비용 효율성을 지닌다. RapidSwap은 새로운 스토리지 기술을 정교하게 활용하면 클라우드 데이터 센터 환경에서 운영비용을 상당히 저감할 수 있다는 사실을 보인다.Chapter 1 Introduction 1 Chapter 2 Background 4 2.1 Tiered Storage 4 2.2 Trends in Storage Devices 5 2.3 Techniques Proposed to Lower Memory Pressure 5 2.3.1 Transparent Memory Compression 5 2.3.2 Far Memory 6 Chapter 3 Motivation 9 3.1 Limitations of Existing Techniques 9 3.2 Tiered Storage as a Promising Alternative 10 Chapter 4 RapidSwap Design and Implementation 12 4.1 RapidSwap Design 12 4.1.1 Storage Frontend 12 4.1.2 Storage Backend 15 4.2 RapidSwap Implementation 17 4.2.1 Swap Handler 17 4.2.2 Storage Frontend 18 4.2.3 Storage Backend 20 Chapter 5 Results 21 5.1 Experimental Setup 21 5.2 RapidSwap Performance 23 5.2.1 Degradation over DRAM 23 5.2.2 Tiered Storage Utilization 27 5.2.3 Hit/Miss Analysis 28 5.3 Cost of Storage Tier 29 5.4 Cost Effectiveness 30 Chapter 6 Conclusion and Future Work 32 6.1 Conclusion 32 6.2 Future Work 33 Bibliography 34 요약 39석

SNU Open Repository and Archive

Composable architecture for rack scale big data computing

Author: Abali Bulent
Chang Victor
Franke Hubertus
Kesavan Mukil
Li Chung-Sheng
Parris Colin
Publication venue: 'Elsevier BV'
Publication date: 01/02/2017
Field of study

The rapid growth of cloud computing, both in terms of the spectrum and volume of cloud workloads, necessitate re-visiting the traditional rack-mountable servers based datacenter design. Next generation datacenters need to offer enhanced support for: (i) fast changing system configuration requirements due to workload constraints, (ii) timely adoption of emerging hardware technologies, and (iii) maximal sharing of systems and subsystems in order to lower costs. Disaggregated datacenters, constructed as a collection of individual resources such as CPU, memory, disks etc., and composed into workload execution units on demand, are an interesting new trend that can address the above challenges. In this paper, we demonstrated the feasibility of composable systems through building a rack scale composable system prototype using PCIe switch. Through empirical approaches, we develop assessment of the opportunities and challenges for leveraging the composable architecture for rack scale cloud datacenters with a focus on big data and NoSQL workloads. In particular, we compare and contrast the programming models that can be used to access the composable resources, and developed the implications for the network and resource provisioning and management for rack scale architecture

Southampton (e-Prints Soton)