116 research outputs found

    Real-Time Virtualization and Cloud Computing

    Get PDF
    In recent years, we have observed three major trends in the development of complex real-time embedded systems. First, to reduce cost and enhance flexibility, multiple systems are sharing common computing platforms via virtualization technology, instead of being deployed separately on physically isolated hosts. Second, multi-core processors are increasingly being used in real-time systems. Third, developers are exploring the possibilities of deploying real-time applications as virtual machines in a public cloud. The integration of real-time systems as virtual machines (VMs) atop common multi-core platforms in a public cloud raises significant new research challenges in meeting the real-time latency requirements of applications. In order to address the challenges of running real-time VMs in the cloud, we first present RT-Xen, a novel real-time scheduling framework within the popular Xen hypervisor. We start with single-core scheduling in RT-Xen, and present the first work that empirically studies and compares different real-time scheduling schemes on a same platform. We then introduce RT-Xen 2.0, which focuses on multi-core scheduling and spanning multiple design spaces, including priority schemes, server schemes, and scheduling policies. Experimental results demonstrate that when combined with compositional scheduling theory, RT-Xen can deliver real-time performance to an application running in a VM, while the default credit scheduler cannot. After that, we present RT-OpenStack, a cloud management system designed to support co-hosting real-time and non-real-time VMs in a cloud. RT-OpenStack studies the problem of running real-time VMs together with non-real-time VMs in a public cloud. Leveraging the resource interface and real-time scheduling provided by RT-Xen, RT-OpenStack provides real-time performance guarantees to real-time VMs, while achieving high resource utilization by allowing non-real-time VMs to share the remaining CPU resources through a novel VM-to-host mapping scheme. Finally, we present RTCA, a real-time communication architecture for VMs sharing a same host, which maintains low latency for high priority inter-domain communication (IDC) traffic in the face of low priority IDC traffic

    Enhanced Cuckoo Search Algorithm for Virtual Machine Placement in Cloud Data Centers

    Get PDF
    In order to enhance resource utilisation and power efficiency in cloud data centres it is important to perform Virtual Machine (VM) placement in an optimal manner. VM placement uses the method of mapping virtual machines to physical machines (PM). Cloud computing researchers have recently introduced various meta-heuristic algorithms for VM placement considering the optimised energy consumption. However, these algorithms do not meet the optimal energy consumption requirements. This paper proposes an Enhanced Cuckoo Search (ECS) algorithm to address the issues with VM placement focusing on the energy consumption. The performance of the proposed algorithm is evaluated using three different workloads in CloudSim tool. The evaluation process includes comparison of the proposed algorithm against the existing Genetic Algorithm (GA), Optimised Firefly Search (OFS) algorithm, and Ant Colony (AC) algorithm. The comparision results illustrate that the proposed ECS algorithm consumes less energy than the participant algorithms while maintaining a steady performance for SLA and VM migration. The ECS algorithm consumes around 25% less energy than GA, 27% less than OFS, and 26% less than AC

    Dynamic Voltage Scaling for Energy- Constrained Real-Time Systems

    Get PDF
    The problem of reducing energy consumption is dominating the design of several real-time systems. The Dynamic Voltage Scaling (DVS) technique, provided by most microprocessors, allow to balance computational speed versus energy consumption. We present some novel energy-aware scheduling algorithms that allow to expoit this technique while meeting real-time constraints. In particular, we present the GRUB-PA algorithm which, unlike most existing algorithms, allows to reduce energy consumption on real-time systems consisting of any kind of task. We also present a working implementation of the algorithm on Linux

    Polymorphic computing abstraction for heterogeneous architectures

    Get PDF
    Integration of multiple computing paradigms onto system on chip (SoC) has pushed the boundaries of design space exploration for hardware architectures and computing system software stack. The heterogeneity of computing styles in SoC has created a new class of architectures referred to as Heterogeneous Architectures. Novel applications developed to exploit the different computing styles are user centric for embedded SoC. Software and hardware designers are faced with several challenges to harness the full potential of heterogeneous architectures. Applications have to execute on more than one compute style to increase overall SoC resource utilization. The implication of such an abstraction is that application threads need to be polymorphic. Operating system layer is thus faced with the problem of scheduling polymorphic threads. Resource allocation is also an important problem to be dealt by the OS. Morphism evolution of application threads is constrained by the availability of heterogeneous computing resources. Traditional design optimization goals such as computational power and lower energy per computation are inadequate to satisfy user centric application resource needs. Resource allocation decisions at application layer need to permeate to the architectural layer to avoid conflicting demands which may affect energy-delay characteristics of application threads. We propose Polymorphic computing abstraction as a unified computing model for heterogeneous architectures to address the above issues. Simulation environment for polymorphic applications is developed and evaluated under various scheduling strategies to determine the effectiveness of polymorphism abstraction on resource allocation. User satisfaction model is also developed to complement polymorphism and used for optimization of resource utilization at application and network layer of embedded systems

    Deadline-Aware Reservation-Based Scheduling

    Get PDF
    The ever-growing need to improve return-on-investment (ROI) for cluster infrastructure that processes data which is being continuously generated at a higher rate than ever before introduces new challenges for big-data processing frameworks. Highly complex mixed workload arriving at modern clusters along with a growing number of time-sensitive critical production jobs necessitates cluster management systems to evolve. Most big-data systems are not only required to guarantee that production jobs will complete before their deadline, but also minimize the latency for best-effort jobs to increase ROI. This research presents DARSS, a deadline-aware reservation-based scheduling system. DARSS addresses the above-stated problem by using a reservation-based approach to scheduling that supports temporal requirements of production jobs while keeping the latency for best-effort jobs low. Fined-grained resource allocation enables DARSS to schedule more tasks than a coarser-grained approach would. Furthermore, DARSS schedules production jobs as close to their deadlines as possible. This scheduling policy allows the system to maximize the number of low-priority tasks that can be scheduled opportunistically. DARSS is a scalable system that can be integrated with YARN. DARSS is evaluated on a simulated cluster of 300 nodes against a workload derived from Google Borg's trace. DARSS is compared with Microsoft's Rayon and YARN's built-in scheduler. DARSS achieves better production job acceptance rate than both YARN and Rayon. The experiments show that all of the production jobs accepted by DARSS complete before their deadlines. Furthermore, DARSS has a higher number of best-effort jobs serviced than Rayon. And finally, DARSS has lower latency for best-effort jobs than Rayon

    Memory resource balancing for virtualized computing

    Get PDF
    Virtualization has become a common abstraction layer in modern data centers. By multiplexing hardware resources into multiple virtual machines (VMs) and thus enabling several operating systems to run on the same physical platform simultaneously, it can effectively reduce power consumption and building size or improve security by isolating VMs. In a virtualized system, memory resource management plays a critical role in achieving high resource utilization and performance. Insufficient memory allocation to a VM will degrade its performance dramatically. On the contrary, over-allocation causes waste of memory resources. Meanwhile, a VM’s memory demand may vary significantly. As a result, effective memory resource management calls for a dynamic memory balancer, which, ideally, can adjust memory allocation in a timely manner for each VM based on their current memory demand and thus achieve the best memory utilization and the optimal overall performance. In order to estimate the memory demand of each VM and to arbitrate possible memory resource contention, a widely proposed approach is to construct an LRU-based miss ratio curve (MRC), which provides not only the current working set size (WSS) but also the correlation between performance and the target memory allocation size. Unfortunately, the cost of constructing an MRC is nontrivial. In this dissertation, we first present a low overhead LRU-based memory demand tracking scheme, which includes three orthogonal optimizations: AVL-based LRU organization, dynamic hot set sizing and intermittent memory tracking. Our evaluation results show that, for the whole SPEC CPU 2006 benchmark suite, after applying the three optimizing techniques, the mean overhead of MRC construction is lowered from 173% to only 2%. Based on current WSS, we then predict its trend in the near future and take different strategies for different prediction results. When there is a sufficient amount of physical memory on the host, it locally balances its memory resource for the VMs. Once the local memory resource is insufficient and the memory pressure is predicted to sustain for a sufficiently long time, a relatively expensive solution, VM live migration, is used to move one or more VMs from the hot host to other host(s). Finally, for transient memory pressure, a remote cache is used to alleviate the temporary performance penalty. Our experimental results show that this design achieves 49% center-wide speedup

    Cooperative framework for open real-time systems

    Get PDF
    Actualmente, os sistemas embebidos estão presentes em toda a parte. Embora grande parte da população que os utiliza não tenha a noção da sua presença, na realidade, se repentinamente estes sistemas deixassem de existir, a sociedade iria sentir a sua falta. A sua utilização massiva deve-se ao facto de estarem practicamente incorporados em quase os todos dispositivos electrónicos de consumo, telecomunicações, automação industrial e automóvel. Influenciada por este crescimento, a comunidade científica foi confrontada com novos problemas distribuídos por vários domínios científicos, dos quais são destacados a gestão da qualidade de serviço e gestão de recursos - domínio encarregue de resolver problemas relacionados com a alocação óptima de recursos físicos, tais como rede, memória e CPU. Existe na literatura um vasto conjunto de modelos que propõem soluções para vários problemas apresentados no contexto destes domínios científicos. No entanto, não é possível encontrar modelos que lidem com a gestão de recursos em ambientes de execução cooperativos e abertos com restrições temporais utilizando coligações entre diferentes nós, de forma a satisfazer os requisitos não funcionais das aplicações. Devido ao facto de estes sistemas serem dinâmicos por natureza, apresentam a característica de não ser possível conhecer, a priori, a quantidade de recursos necessários que uma aplicação irá requerer do sistema no qual irá ser executada. Este conhecimento só é adquirido aquando da execução da aplicação. De modo a garantir uma gestão eficiente dos recursos disponíveis, em sistemas que apresentam um grande dinamismo na execução de tarefas com e sem restrições temporais, é necessário garantir dois aspectos fundamentais. O primeiro está relacionado com a obtenção de garantias na execução de tarefas de tempo-real. Estas devem sempre ser executadas dentro da janela temporal requirida. O segundo aspecto refere a necessidade de garantir que todos os recursos necessários à execução das tarefas são fornecidos, com o objectivo de manter os níveis de performance quer das aplicações, quer do próprio sistema. Tendo em conta os dois aspectos acima mencionados, o projecto CooperatES foi especificado com o objectivo de permitir a dispositivos com poucos recursos uma execução colectiva de serviços com os seus vizinhos, de modo a cumprir com as complexas restrições de qualidade de serviço impostas pelos utilizadores ou pelas aplicações. Decorrendo no contexto do projecto CooperatES, o trabalho resultante desta tese tem como principal objectivo avaliar a practicabilidade dos conceitos principais propostos no âmbito do projecto. O trabalho em causa implicou a escolha e análise de uma plataforma, a análise de requisitos, a implementação e avaliação de uma framework que permite a execução cooperativa de aplicações e serviços que apresentem requisitos de qualidade de serviço. Do trabalho desenvolvido resultaram as seguintes contribuições: Análise das plataformas de código aberto que possam ser utilizadas na implementação dos conceitos relacionados com o projecto CooperatES; Critérios que influenciaram a escolha da plataforma Android e um estudo focado na análise da plataforma sob uma perspectiva de sistemas de tempo-real; Experiências na implementação dos conceitos do projecto na plataforma Android; Avaliação da practicabilidade dos conceitos propostos no projecto CooperatES; Proposta de extensões que permitam incorporar características de sistemas de tempo real abertos na plataforma Android.Embedded devices are reaching a point where society does not notice its presence; however, if suddenly taken away, everyone would notice their absence. The new, small, embedded devices used in consumer electronics, telecommunication, industrial automation, or automotive systems are the reason for their massive spread. Influenced by this growth and pervasiveness, the scientific community is faced with new challenges in several domains. Of these, important ones are the management of the quality of the provided services and the management of the underlying resources - both interconnected to solve the problem of optimal allocation of physical resources (namely CPU, memory and network as examples), whilst providing the best possible quality to users. Although several models have been presented in literature, a recent proposal handles resource management by using coalitions of nodes in open real-time cooperative environments, as a solution to guarantee that the application’s non-functional requirements are met, and to provide the best possible quality of service to users. This proposal, the CooperatES framework, provides better models and mechanisms to handle resource management in open real-time systems, allowing resource constrained devices to collectively execute services with their neighbours, in order to fulfil the complex Quality of Service constraints imposed by users and applications. Within the context of the CooperatES framework, the work presented in this thesis evaluates the feasibility of the implementation of the framework’s Quality of Service concept within current embedded Java platforms, and proposes a solution and architecture for a specific platform: the Android operating system. To this purpose, the work provides an evaluation of the suitability of Java solutions for real-time and embedded systems, an evaluation of the Android platform for open real-time systems, as well as discusses the required extensions to Android allowing it to be used within real-time system. Furthermore, this thesis presents a prototype implementation of the CooperatES framework within the Android platform, which allows determining the suitability of the proposed platform extensions for open real-time systems applications

    멀티 태스킹 환경에서 GPU를 사용한 범용적 계산 응용의 효율적인 시스템 자원 활용을 위한 GPU 시스템 최적화

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·컴퓨터공학부, 2020. 8. 염헌영.Recently, General Purpose GPU (GPGPU) applications are playing key roles in many different research fields, such as high-performance computing (HPC) and deep learning (DL). The common feature exists in these applications is that all of them require massive computation power, which follows the high parallelism characteristics of the graphics processing unit (GPU). However, because of the resource usage pattern of each GPGPU application varies, a single application cannot fully exploit the GPU systems resources to achieve the best performance of the GPU since the GPU system is designed to provide system-level fairness to all applications instead of optimizing for a specific type. GPU multitasking can address the issue by co-locating multiple kernels with diverse resource usage patterns to share the GPU resource in parallel. However, the current GPU mul- titasking scheme focuses just on co-launching the kernels rather than making them execute more efficiently. Besides, the current GPU multitasking scheme is not open-sourced, which makes it more difficult to be optimized, since the GPGPU applications and the GPU system are unaware of the feature of each other. In this dissertation, we claim that using the support from framework between the GPU system and the GPGPU applications without modifying the application can yield better performance. We design and implement the frame- work while addressing two issues in GPGPU applications. First, we introduce a GPU memory checkpointing approach between the host memory and the device memory to address the problem that GPU memory cannot be over-subscripted in a multitasking environment. Second, we present a fine-grained GPU kernel management scheme to avoid the GPU resource under-utilization problem in a i multitasking environment. We implement and evaluate our schemes on a real GPU system. The experimental results show that our proposed approaches can solve the problems related to GPGPU applications than the existing approaches while delivering better performance.최근 범용 GPU (GPGPU) 응용 프로그램은 고성능 컴퓨팅 (HPC) 및 딥 러닝 (DL)과 같은 다양한 연구 분야에서 핵심적인 역할을 수행하고 있다. 이러한 응 용 분야의 공통적인 특성은 거대한 계산 성능이 필요한 것이며 그래픽 처리 장치 (GPU)의 높은 병렬 처리 특성과 매우 적합하다. 그러나 GPU 시스템은 특정 유 형의 응용 프로그램에 최저화하는 대신 모든 응용 프로그램에 시스템 수준의 공정 성을 제공하도록 설계되어 있으며 각 GPGPU 응용 프로그램의 자원 사용 패턴이 다양하기 때문에 단일 응용 프로그램이 GPU 시스템의 리소스를 완전히 활용하여 GPU의 최고 성능을 달성 할 수는 없다. 따라서 GPU 멀티 태스킹은 다양한 리소스 사용 패턴을 가진 여러 응용 프로그 램을 함께 배치하여 GPU 리소스를 공유함으로써 GPU 자원 사용률 저하 문제를 해결할 수 있다. 그러나 기존 GPU 멀티 태스킹 기술은 자원 사용률 관점에서 응 용 프로그램의 효율적인 실행보다 공동으로 실행하는 데 중점을 둔다. 또한 현재 GPU 멀티 태스킹 기술은 오픈 소스가 아니므로 응용 프로그램과 GPU 시스템이 서로의 기능을 인식하지 못하기 때문에 최적화하기가 더 어려울 수도 있다. 본 논문에서는 응용 프로그램을 수정 없이 GPU 시스템과 GPGPU 응용 사 이의 프레임워크를 통해 사용하면 보다 높은 응용성능과 자원 사용을 보일 수 있음을 증명하고자 한다. 그러기 위해 GPU 태스크 관리 프레임워크를 개발하여 GPU 멀티 태스킹 환경에서 발생하는 두 가지 문제를 해결하였다. 첫째, 멀티 태 스킹 환경에서 GPU 메모리 초과 할당할 수 없는 문제를 해결하기 위해 호스트 메모리와 디바이스 메모리에 체크포인트 방식을 도입하였다. 둘째, 멀티 태스킹 환 경에서 GPU 자원 사용율 저하 문제를 해결하기 위해 더욱 세분화 된 GPU 커널 관리 시스템을 제시하였다. 본 논문에서는 제안한 방법들의 효과를 증명하기 위해 실제 GPU 시스템에 92 구현하고 그 성능을 평가하였다. 제안한 접근방식이 기존 접근 방식보다 GPGPU 응용 프로그램과 관련된 문제를 해결할 수 있으며 더 높은 성능을 제공할 수 있음을 확인할 수 있었다.Chapter 1 Introduction 1 1.1 Motivation 2 1.2 Contribution . 7 1.3 Outline 8 Chapter 2 Background 10 2.1 GraphicsProcessingUnit(GPU) and CUDA 10 2.2 CheckpointandRestart . 11 2.3 ResourceSharingModel. 11 2.4 CUDAContext 12 2.5 GPUThreadBlockScheduling . 13 2.6 Multi-ProcessServicewithHyper-Q 13 Chapter 3 Checkpoint based solution for GPU memory over- subscription problem 16 3.1 Motivation 16 3.2 RelatedWork. 18 3.3 DesignandImplementation . 20 3.3.1 System Design 21 3.3.2 CUDAAPIwrappingmodule 22 3.3.3 Scheduler . 28 3.4 Evaluation. 31 3.4.1 Evaluationsetup . 31 3.4.2 OverheadofFlexGPU 32 3.4.3 Performance with GPU Benchmark Suits 34 3.4.4 Performance with Real-world Workloads 36 3.4.5 Performance of workloads composed of multiple applications 39 3.5 Summary 42 Chapter 4 A Workload-aware Fine-grained Resource Manage- ment Framework for GPGPUs 43 4.1 Motivation 43 4.2 RelatedWork. 45 4.2.1 GPUresourcesharing 45 4.2.2 GPUscheduling . 46 4.3 DesignandImplementation . 47 4.3.1 SystemArchitecture . 47 4.3.2 CUDAAPIWrappingModule . 49 4.3.3 smCompactorRuntime . 50 4.3.4 ImplementationDetails . 57 4.4 Analysis on the relation between performance and workload usage pattern 60 4.4.1 WorkloadDefinition . 60 4.4.2 Analysisonperformancesaturation 60 4.4.3 Predict the necessary SMs and thread blocks for best performance . 64 4.5 Evaluation. 69 4.5.1 EvaluationMethodology. 70 4.5.2 OverheadofsmCompactor . 71 4.5.3 Performance with Different Thread Block Counts on Dif- ferentNumberofSMs 72 4.5.4 Performance with Concurrent Kernel and Resource Sharing 74 4.6 Summary . 79 Chapter 5 Conclusion. 81 요약. 92Docto

    Extending Memory Capacity in Consumer Devices with Emerging Non-Volatile Memory: An Experimental Study

    Full text link
    The number and diversity of consumer devices are growing rapidly, alongside their target applications' memory consumption. Unfortunately, DRAM scalability is becoming a limiting factor to the available memory capacity in consumer devices. As a potential solution, manufacturers have introduced emerging non-volatile memories (NVMs) into the market, which can be used to increase the memory capacity of consumer devices by augmenting or replacing DRAM. Since entirely replacing DRAM with NVM in consumer devices imposes large system integration and design challenges, recent works propose extending the total main memory space available to applications by using NVM as swap space for DRAM. However, no prior work analyzes the implications of enabling a real NVM-based swap space in real consumer devices. In this work, we provide the first analysis of the impact of extending the main memory space of consumer devices using off-the-shelf NVMs. We extensively examine system performance and energy consumption when the NVM device is used as swap space for DRAM main memory to effectively extend the main memory capacity. For our analyses, we equip real web-based Chromebook computers with the Intel Optane SSD, which is a state-of-the-art low-latency NVM-based SSD device. We compare the performance and energy consumption of interactive workloads running on our Chromebook with NVM-based swap space, where the Intel Optane SSD capacity is used as swap space to extend main memory capacity, against two state-of-the-art systems: (i) a baseline system with double the amount of DRAM than the system with the NVM-based swap space; and (ii) a system where the Intel Optane SSD is naively replaced with a state-of-the-art (yet slower) off-the-shelf NAND-flash-based SSD, which we use as a swap space of equivalent size as the NVM-based swap space
    corecore