117,097 research outputs found

    Pegasus: Performance Engineering for Software Applications Targeting HPC Systems

    Get PDF
    Developing and optimizing software applications for high performance and energy efficiency is a very challenging task, even when considering a single target machine. For instance, optimizing for multicore-based computing systems requires in-depth knowledge about programming languages, application programming interfaces, compilers, performance tuning tools, and computer architecture and organization. Many of the tasks of performance engineering methodologies require manual efforts and the use of different tools not always part of an integrated toolchain. This paper presents Pegasus, a performance engineering approach supported by a framework that consists of a source-to-source compiler, controlled and guided by strategies programmed in a Domain-Specific Language, and an autotuner. Pegasus is a holistic and versatile approach spanning various decision layers composing the software stack, and exploiting the system capabilities and workloads effectively through the use of runtime autotuning. The Pegasus approach helps developers by automating tasks regarding the efficient implementation of software applications in multicore computing systems. These tasks focus on application analysis, profiling, code transformations, and the integration of runtime autotuning. Pegasus allows developers to program their strategies or to automatically apply existing strategies to software applications in order to ensure the compliance of non-functional requirements, such as performance and energy efficiency. We show how to apply Pegasus and demonstrate its applicability and effectiveness in a complex case study, which includes tasks from a smart navigation system

    Energy Efficient Data-Intensive Computing With Mapreduce

    Get PDF
    Power and energy consumption are critical constraints in data center design and operation. In data centers, MapReduce data-intensive applications demand significant resources and energy. Recognizing the importance and urgency of optimizing energy usage of MapReduce applications, this work aims to provide instrumental tools to measure and evaluate MapReduce energy efficiency and techniques to conserve energy without impacting performance. Energy conservation for data-intensive computing requires enabling technology to provide detailed and systemic energy information and to identify in the underlying system hardware and software. To address this need, we present eTune, a fine-grained, scalable energy profiling framework for data-intensive computing on large-scale distributed systems. eTune leverages performance monitoring counters (PMCs) on modern computer components and statistically builds power-performance correlation models. Using learned models, eTune augments direct measurement with a software-based power estimator that runs on compute nodes and reports power at multiple levels including node, core, memory, and disks with high accuracy. Data-intensive computing differs from traditional high performance computing as most execution time is spent in moving data between storage devices, nodes, and components. Since data movements are potential performance and energy bottlenecks, we propose an analysis framework with methods and metrics for evaluating and characterizing costly built-in MapReduce data movements. The revealed data movement energy characteristics can be exploited in system design and resource allocation to improve data-intensive computing energy efficiency. Finally, we present an optimization technique that targets inefficient built-in MapReduce data movements to conserve energy without impacting performance. The optimization technique allocates the optimal number of compute nodes to applications and dynamically schedules processor frequency during its execution based on data movement characteristics. Experimental results show significant energy savings, though improvements depend on both workload characteristics and policies of resource and dynamic voltage and frequency scheduling. As data volume doubles every two years and more data centers are put into production, energy consumption is expected to grow further. We expect these studies provide direction and insight in building more energy efficient data-intensive systems and applications, and the tools and techniques are adopted by other researchers for their energy efficient studies

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    Adaptive Power and Resource Management Techniques for Multithreaded Workloads

    Get PDF
    Abstract-As today's computing trends are moving towards the cloud, meeting the increasing computational demand while minimizing the energy costs in data centers has become essential. This work introduces two adaptive techniques to reduce the energy consumption of the computing clusters through power and resource management on multi-core processors. We first present a novel power capping technique to constrain the power consumption of computing nodes. Our technique combines Dynamic Voltage-Frequency Scaling (DVFS) and thread allocation on multi-core systems. By utilizing machine learning techniques, our power capping method is able to meet the power budgets 82% of the time without requiring any power measurement device and reduces the energy consumption by 51.6% on average in comparison to the state-of-the-art techniques. We then introduce an autonomous resource management technique for consolidated multi-threaded workloads running on multi-core servers. Our technique first classifies applications according to their energy efficiency measure, then proportionally allocates resources for co-scheduled applications to improve the energy efficiency. The proposed technique improves the energy efficiency by 17% in comparison to state-of-the-art co-scheduling policies. I. INTRODUCTION Energy-related costs are among the major contributors to the total cost of ownership of today's data centers and high performance computing (HPC) clusters. Therefore, future computing clusters are required to be energy-efficient in order to be able to meet the continuously increasing computational demand. Moreover, administration and management of the data center resources has become significantly complex, due to increasing number of servers installed on data centers. Therefore, designing autonomous techniques to optimally manage the limited data center resources is essential to achieve sustainability in the cloud era. The achievable maximum performance of a computing cluster is determined by (1) infrastructural/cost limitations (e.g, power delivery, cooling capacity, electricity cost) and/or (2) available hardware resources (e.g., CPU, disk size). Optimizing the performance under such constraints (i,e., power, resource) is critically important to improve the energy efficiency, therefore to reduce to cost of computing. Moreover, the emergence of multi-threaded applications on cloud resources bring additional challenges for optimizing the performanceenergy tradeoffs under resource constraints, due to their complex characteristics such as performance scalability and intercore communication. In this work, we present two adaptive management techniques for multi-threaded workloads to improve the energ

    Edge Offloading in Smart Grid

    Full text link
    The energy transition supports the shift towards more sustainable energy alternatives, paving towards decentralized smart grids, where the energy is generated closer to the point of use. The decentralized smart grids foresee novel data-driven low latency applications for improving resilience and responsiveness, such as peer-to-peer energy trading, microgrid control, fault detection, or demand response. However, the traditional cloud-based smart grid architectures are unable to meet the requirements of the new emerging applications such as low latency and high-reliability thus alternative architectures such as edge, fog, or hybrid need to be adopted. Moreover, edge offloading can play a pivotal role for the next-generation smart grid AI applications because it enables the efficient utilization of computing resources and addresses the challenges of increasing data generated by IoT devices, optimizing the response time, energy consumption, and network performance. However, a comprehensive overview of the current state of research is needed to support sound decisions regarding energy-related applications offloading from cloud to fog or edge, focusing on smart grid open challenges and potential impacts. In this paper, we delve into smart grid and computational distribution architec-tures, including edge-fog-cloud models, orchestration architecture, and serverless computing, and analyze the decision-making variables and optimization algorithms to assess the efficiency of edge offloading. Finally, the work contributes to a comprehensive understanding of the edge offloading in smart grid, providing a SWOT analysis to support decision making.Comment: to be submitted to journa

    AIR: Adaptive Dynamic Precision Iterative Refinement

    Get PDF
    In high performance computing, applications often require very accurate solutions while minimizing runtimes and power consumption. Improving the ratio of the number of logic gates implementing floating point arithmetic operations to the total number of logic gates enables greater efficiency, potentially with higher performance and lower power consumption. Software executing on the fixed hardware in Von-Neuman architectures faces limitations on improving this ratio, since processors require extensive supporting logic to fetch and decode instructions while employing arithmetic units with statically defined precision. This dissertation explores novel approaches to improve computing architectures for linear system applications not only by designing application-specific hardware but also by optimizing precision by applying adaptive dynamic precision iterative refinement (AIR). This dissertation shows that AIR is numerically stable and well behaved. Theoretically, AIR can produce up to 3 times speedup over mixed precision iterative refinement on FPGAs. Implementing an AIR prototype for the refinement procedure on a Xilinx XC6VSX475T FPGA results in an estimated around 0.5, 8, and 2 times improvement for the time-, clock-, and energy-based performance per iteration compared to mixed precision iterative refinement on the Nvidia Tesla C2075 GPU, when a user requires a prescribed accuracy between single and double precision. AIR using FPGAs can produce beyond double precision accuracy effectively, while CPUs or GPUs need software help causing substantial overhead

    Carbon Containers: A System-level Facility for Managing Application-level Carbon Emissions

    Full text link
    To reduce their environmental impact, cloud datacenters' are increasingly focused on optimizing applications' carbon-efficiency, or work done per mass of carbon emitted. To facilitate such optimizations, we present Carbon Containers, a simple system-level facility, which extends prior work on power containers, that automatically regulates applications' carbon emissions in response to variations in both their workload's intensity and their energy's carbon-intensity. Specifically, \carbonContainerS enable applications to specify a maximum carbon emissions rate (in gâ‹…\cdotCO2_2e/hr), and then transparently enforce this rate via a combination of vertical scaling, container migration, and suspend/resume while maximizing either energy-efficiency or performance. Carbon Containers are especially useful for applications that i) must continue running even during high-carbon periods, and ii) execute in regions with few variations in carbon-intensity. These low-variability regions also tend to have high average carbon-intensity, which increases the importance of regulating carbon emissions. We implement a Carbon Containers prototype by extending Linux Containers to incorporate the mechanisms above and evaluate it using real workload traces and carbon-intensity data from multiple regions. We compare Carbon Containers with prior work that regulates carbon emissions by suspending/resuming applications during high/low carbon periods. We show that Carbon Containers are more carbon-efficient and improve performance while maintaining similar carbon emissions.Comment: ACM Symposium on Cloud Computing (SoCC
    • …
    corecore