434 research outputs found

    On Energy-efficient Checkpointing in High-throughput Cycle-stealing Distributed Systems

    Get PDF
    Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware and software failures and interruptions from resource owners. With increasing scrutiny of the energy consumption of IT infrastructures, it is important to understand the impact of checkpointing on the energy consumption of HTC environments. In this paper we demonstrate through trace-driven simulation on real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst reducing the makespan of tasks. Furthermore, we identify factors important in deciding whether to employ checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches

    ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น ํ™˜๊ฒฝ์—์„œ GPU๋ฅผ ์‚ฌ์šฉํ•œ ๋ฒ”์šฉ์  ๊ณ„์‚ฐ ์‘์šฉ์˜ ํšจ์œจ์ ์ธ ์‹œ์Šคํ…œ ์ž์› ํ™œ์šฉ์„ ์œ„ํ•œ GPU ์‹œ์Šคํ…œ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์—ผํ—Œ์˜.Recently, General Purpose GPU (GPGPU) applications are playing key roles in many different research fields, such as high-performance computing (HPC) and deep learning (DL). The common feature exists in these applications is that all of them require massive computation power, which follows the high parallelism characteristics of the graphics processing unit (GPU). However, because of the resource usage pattern of each GPGPU application varies, a single application cannot fully exploit the GPU systems resources to achieve the best performance of the GPU since the GPU system is designed to provide system-level fairness to all applications instead of optimizing for a specific type. GPU multitasking can address the issue by co-locating multiple kernels with diverse resource usage patterns to share the GPU resource in parallel. However, the current GPU mul- titasking scheme focuses just on co-launching the kernels rather than making them execute more efficiently. Besides, the current GPU multitasking scheme is not open-sourced, which makes it more difficult to be optimized, since the GPGPU applications and the GPU system are unaware of the feature of each other. In this dissertation, we claim that using the support from framework between the GPU system and the GPGPU applications without modifying the application can yield better performance. We design and implement the frame- work while addressing two issues in GPGPU applications. First, we introduce a GPU memory checkpointing approach between the host memory and the device memory to address the problem that GPU memory cannot be over-subscripted in a multitasking environment. Second, we present a fine-grained GPU kernel management scheme to avoid the GPU resource under-utilization problem in a i multitasking environment. We implement and evaluate our schemes on a real GPU system. The experimental results show that our proposed approaches can solve the problems related to GPGPU applications than the existing approaches while delivering better performance.์ตœ๊ทผ ๋ฒ”์šฉ GPU (GPGPU) ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์€ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… (HPC) ๋ฐ ๋”ฅ ๋Ÿฌ๋‹ (DL)๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ ๋ถ„์•ผ์—์„œ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์‘ ์šฉ ๋ถ„์•ผ์˜ ๊ณตํ†ต์ ์ธ ํŠน์„ฑ์€ ๊ฑฐ๋Œ€ํ•œ ๊ณ„์‚ฐ ์„ฑ๋Šฅ์ด ํ•„์š”ํ•œ ๊ฒƒ์ด๋ฉฐ ๊ทธ๋ž˜ํ”ฝ ์ฒ˜๋ฆฌ ์žฅ์น˜ (GPU)์˜ ๋†’์€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ํŠน์„ฑ๊ณผ ๋งค์šฐ ์ ํ•ฉํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ GPU ์‹œ์Šคํ…œ์€ ํŠน์ • ์œ  ํ˜•์˜ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ์ตœ์ €ํ™”ํ•˜๋Š” ๋Œ€์‹  ๋ชจ๋“  ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์— ์‹œ์Šคํ…œ ์ˆ˜์ค€์˜ ๊ณต์ • ์„ฑ์„ ์ œ๊ณตํ•˜๋„๋ก ์„ค๊ณ„๋˜์–ด ์žˆ์œผ๋ฉฐ ๊ฐ GPGPU ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ์ž์› ์‚ฌ์šฉ ํŒจํ„ด์ด ๋‹ค์–‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ผ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์ด GPU ์‹œ์Šคํ…œ์˜ ๋ฆฌ์†Œ์Šค๋ฅผ ์™„์ „ํžˆ ํ™œ์šฉํ•˜์—ฌ GPU์˜ ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑ ํ•  ์ˆ˜๋Š” ์—†๋‹ค. ๋”ฐ๋ผ์„œ GPU ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น์€ ๋‹ค์–‘ํ•œ ๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ ํŒจํ„ด์„ ๊ฐ€์ง„ ์—ฌ๋Ÿฌ ์‘์šฉ ํ”„๋กœ๊ทธ ๋žจ์„ ํ•จ๊ป˜ ๋ฐฐ์น˜ํ•˜์—ฌ GPU ๋ฆฌ์†Œ์Šค๋ฅผ ๊ณต์œ ํ•จ์œผ๋กœ์จ GPU ์ž์› ์‚ฌ์šฉ๋ฅ  ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๊ธฐ์กด GPU ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น ๊ธฐ์ˆ ์€ ์ž์› ์‚ฌ์šฉ๋ฅ  ๊ด€์ ์—์„œ ์‘ ์šฉ ํ”„๋กœ๊ทธ๋žจ์˜ ํšจ์œจ์ ์ธ ์‹คํ–‰๋ณด๋‹ค ๊ณต๋™์œผ๋กœ ์‹คํ–‰ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘”๋‹ค. ๋˜ํ•œ ํ˜„์žฌ GPU ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น ๊ธฐ์ˆ ์€ ์˜คํ”ˆ ์†Œ์Šค๊ฐ€ ์•„๋‹ˆ๋ฏ€๋กœ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๊ณผ GPU ์‹œ์Šคํ…œ์ด ์„œ๋กœ์˜ ๊ธฐ๋Šฅ์„ ์ธ์‹ํ•˜์ง€ ๋ชปํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ตœ์ ํ™”ํ•˜๊ธฐ๊ฐ€ ๋” ์–ด๋ ค์šธ ์ˆ˜๋„ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์„ ์ˆ˜์ • ์—†์ด GPU ์‹œ์Šคํ…œ๊ณผ GPGPU ์‘์šฉ ์‚ฌ ์ด์˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉํ•˜๋ฉด ๋ณด๋‹ค ๋†’์€ ์‘์šฉ์„ฑ๋Šฅ๊ณผ ์ž์› ์‚ฌ์šฉ์„ ๋ณด์ผ ์ˆ˜ ์žˆ์Œ์„ ์ฆ๋ช…ํ•˜๊ณ ์ž ํ•œ๋‹ค. ๊ทธ๋Ÿฌ๊ธฐ ์œ„ํ•ด GPU ํƒœ์Šคํฌ ๊ด€๋ฆฌ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ GPU ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น ํ™˜๊ฒฝ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ์ฒซ์งธ, ๋ฉ€ํ‹ฐ ํƒœ ์Šคํ‚น ํ™˜๊ฒฝ์—์„œ GPU ๋ฉ”๋ชจ๋ฆฌ ์ดˆ๊ณผ ํ• ๋‹นํ•  ์ˆ˜ ์—†๋Š” ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ˜ธ์ŠคํŠธ ๋ฉ”๋ชจ๋ฆฌ์™€ ๋””๋ฐ”์ด์Šค ๋ฉ”๋ชจ๋ฆฌ์— ์ฒดํฌํฌ์ธํŠธ ๋ฐฉ์‹์„ ๋„์ž…ํ•˜์˜€๋‹ค. ๋‘˜์งธ, ๋ฉ€ํ‹ฐ ํƒœ์Šคํ‚น ํ™˜ ๊ฒฝ์—์„œ GPU ์ž์› ์‚ฌ์šฉ์œจ ์ €ํ•˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋”์šฑ ์„ธ๋ถ„ํ™” ๋œ GPU ์ปค๋„ ๊ด€๋ฆฌ ์‹œ์Šคํ…œ์„ ์ œ์‹œํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋“ค์˜ ํšจ๊ณผ๋ฅผ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•ด ์‹ค์ œ GPU ์‹œ์Šคํ…œ์— 92 ๊ตฌํ˜„ํ•˜๊ณ  ๊ทธ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜์˜€๋‹ค. ์ œ์•ˆํ•œ ์ ‘๊ทผ๋ฐฉ์‹์ด ๊ธฐ์กด ์ ‘๊ทผ ๋ฐฉ์‹๋ณด๋‹ค GPGPU ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ๊ณผ ๊ด€๋ จ๋œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.Chapter 1 Introduction 1 1.1 Motivation 2 1.2 Contribution . 7 1.3 Outline 8 Chapter 2 Background 10 2.1 GraphicsProcessingUnit(GPU) and CUDA 10 2.2 CheckpointandRestart . 11 2.3 ResourceSharingModel. 11 2.4 CUDAContext 12 2.5 GPUThreadBlockScheduling . 13 2.6 Multi-ProcessServicewithHyper-Q 13 Chapter 3 Checkpoint based solution for GPU memory over- subscription problem 16 3.1 Motivation 16 3.2 RelatedWork. 18 3.3 DesignandImplementation . 20 3.3.1 System Design 21 3.3.2 CUDAAPIwrappingmodule 22 3.3.3 Scheduler . 28 3.4 Evaluation. 31 3.4.1 Evaluationsetup . 31 3.4.2 OverheadofFlexGPU 32 3.4.3 Performance with GPU Benchmark Suits 34 3.4.4 Performance with Real-world Workloads 36 3.4.5 Performance of workloads composed of multiple applications 39 3.5 Summary 42 Chapter 4 A Workload-aware Fine-grained Resource Manage- ment Framework for GPGPUs 43 4.1 Motivation 43 4.2 RelatedWork. 45 4.2.1 GPUresourcesharing 45 4.2.2 GPUscheduling . 46 4.3 DesignandImplementation . 47 4.3.1 SystemArchitecture . 47 4.3.2 CUDAAPIWrappingModule . 49 4.3.3 smCompactorRuntime . 50 4.3.4 ImplementationDetails . 57 4.4 Analysis on the relation between performance and workload usage pattern 60 4.4.1 WorkloadDefinition . 60 4.4.2 Analysisonperformancesaturation 60 4.4.3 Predict the necessary SMs and thread blocks for best performance . 64 4.5 Evaluation. 69 4.5.1 EvaluationMethodology. 70 4.5.2 OverheadofsmCompactor . 71 4.5.3 Performance with Different Thread Block Counts on Dif- ferentNumberofSMs 72 4.5.4 Performance with Concurrent Kernel and Resource Sharing 74 4.6 Summary . 79 Chapter 5 Conclusion. 81 ์š”์•ฝ. 92Docto

    Energy-efficient checkpointing in high-throughput cycle-stealing distributed systems

    Get PDF
    Checkpointing is a fault-tolerance mechanism commonly used in High Throughput Computing (HTC) environments to allow the execution of long-running computational tasks on compute resources subject to hardware or software failures as well as interruptions from resource owners and more important tasks. Until recently many researchers have focused on the performance gains achieved through checkpointing, but now with growing scrutiny of the energy consumption of IT infrastructures it is increasingly important to understand the energy impact of checkpointing within an HTC environment. In this paper we demonstrate through trace-driven simulation of real-world datasets that existing checkpointing strategies are inadequate at maintaining an acceptable level of energy consumption whilst maintaing the performance gains expected with checkpointing. Furthermore, we identify factors important in deciding whether to exploit checkpointing within an HTC environment, and propose novel strategies to curtail the energy consumption of checkpointing approaches whist maintaining the performance benefits

    Fail Over Strategy for Fault Tolerance in Cloud Computing Environment

    Get PDF
    YesCloud fault tolerance is an important issue in cloud computing platforms and applications. In the event of an unexpected system failure or malfunction, a robust fault-tolerant design may allow the cloud to continue functioning correctly possibly at a reduced level instead of failing completely. To ensure high availability of critical cloud services, the application execution and hardware performance, various fault tolerant techniques exist for building self-autonomous cloud systems. In comparison to current approaches, this paper proposes a more robust and reliable architecture using optimal checkpointing strategy to ensure high system availability and reduced system task service finish time. Using pass rates and virtualised mechanisms, the proposed Smart Failover Strategy (SFS) scheme uses components such as Cloud fault manager, Cloud controller, Cloud load balancer and a selection mechanism, providing fault tolerance via redundancy, optimized selection and checkpointing. In our approach, the Cloud fault manager repairs faults generated before the task time deadline is reached, blocking unrecoverable faulty nodes as well as their virtual nodes. This scheme is also able to remove temporary software faults from recoverable faulty nodes, thereby making them available for future request. We argue that the proposed SFS algorithm makes the system highly fault tolerant by considering forward and backward recovery using diverse software tools. Compared to existing approaches, preliminary experiment of the SFS algorithm indicate an increase in pass rates and a consequent decrease in failure rates, showing an overall good performance in task allocations. We present these results using experimental validation tools with comparison to other techniques, laying a foundation for a fully fault tolerant IaaS Cloud environment

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Failure Mitigation in Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

    Full text link
    A new roll-forward technique is proposed that recovers from any single fail-stop failure in MM integer data streams (Mโ‰ฅ3M\geq3) when undergoing linear, sesquilinear or bijective (LSB) operations, such as: scaling, additions/subtractions, inner or outer vector products and permutations. In the proposed approach, the MM input integer data streams are linearly superimposed to form MM numerically entangled integer data streams that are stored in-place of the original inputs. A series of LSB operations can then be performed directly using these entangled data streams. The output results can be extracted from any Mโˆ’1M-1 entangled output streams by additions and arithmetic shifts, thereby guaranteeing robustness to a fail-stop failure in any single stream computation. Importantly, unlike other methods, the number of operations required for the entanglement, extraction and recovery of the results is linearly related to the number of the inputs and does not depend on the complexity of the performed LSB operations. We have validated our proposal in an Intel processor (Haswell architecture with AVX2 support) via convolution operations. Our analysis and experiments reveal that the proposed approach incurs only 1.8%1.8\% to 2.8%2.8\% reduction in processing throughput in comparison to the failure-intolerant approach. This overhead is 9 to 14 times smaller than that of the equivalent checksum-based method. Thus, our proposal can be used in distributed systems and unreliable processor hardware, or safety-critical applications, where robustness against fail-stop failures becomes a necessity.Comment: Proc. 21st IEEE International On-Line Testing Symposium (IOLTS 2015), July 2015, Halkidiki, Greec

    Operating policies for energy efficient large scale computing

    Get PDF
    PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre operators predicted to spend more on energy than hardware infrastructure in the next five years. With Western European datacentre power consumption estimated at 56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency of IT operations is imperative. The issue is further compounded by social and political factors and strict environmental legislation governing organisations. One such example of large IT systems includes high-throughput cycle stealing distributed systems such as HTCondor and BOINC, which allow organisations to leverage spare capacity on existing infrastructure to undertake valuable computation. As a consequence of increased scrutiny of the energy impact of these systems, aggressive power management policies are often employed to reduce the energy impact of institutional clusters, but in doing so these policies severely restrict the computational resources available for high-throughput systems. These policies are often configured to quickly transition servers and end-user cluster machines into low power states after only short idle periods, further compounding the issue of reliability. In this thesis, we evaluate operating policies for energy efficiency in large-scale computing environments by means of trace-driven discrete event simulation, leveraging real-world workload traces collected within Newcastle University. The major contributions of this thesis are as follows: i) Evaluation of novel energy efficient management policies for a decentralised peer-to-peer (P2P) BitTorrent environment. ii) Introduce a novel simulation environment for the evaluation of energy efficiency of large scale high-throughput computing systems, and propose a generalisable model of energy consumption in high-throughput computing systems. iii iii) Proposal and evaluation of resource allocation strategies for energy consumption in high-throughput computing systems for a real workload. iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task execution within high-throughput computing systems to reduce energy consumption. v) Evaluation of the impact of fault tolerance mechanisms on energy consumption
    • โ€ฆ
    corecore