43,163 research outputs found

    Iso-energy-efficiency: An approach to power-constrained parallel computation

    Get PDF
    Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

    Dynamic Energy Management for Chip Multi-processors under Performance Constraints

    Get PDF
    We introduce a novel algorithm for dynamic energy management (DEM) under performance constraints in chip multi-processors (CMPs). Using the novel concept of delayed instructions count, performance loss estimations are calculated at the end of each control period for each core. In addition, a Kalman filtering based approach is employed to predict workload in the next control period for which voltage-frequency pairs must be selected. This selection is done with a novel dynamic voltage and frequency scaling (DVFS) algorithm whose objective is to reduce energy consumption but without degrading performance beyond the user set threshold. Using our customized Sniper based CMP system simulation framework, we demonstrate the effectiveness of the proposed algorithm for a variety of benchmarks for 16 core and 64 core network-on-chip based CMP architectures. Simulation results show consistent energy savings across the board. We present our work as an investigation of the tradeoff between the achievable energy reduction via DVFS when predictions are done using the effective Kalman filter for different performance penalty thresholds

    An intelligent processing environment for real-time simulation

    Get PDF
    The development of a highly efficient and thus truly intelligent processing environment for real-time general purpose simulation of continuous systems is described. Such an environment can be created by mapping the simulation process directly onto the University of Alamba's OPERA architecture. To facilitate this effort, the field of continuous simulation is explored, highlighting areas in which efficiency can be improved. Areas in which parallel processing can be applied are also identified, and several general OPERA type hardware configurations that support improved simulation are investigated. Three direct execution parallel processing environments are introduced, each of which greatly improves efficiency by exploiting distinct areas of the simulation process. These suggested environments are candidate architectures around which a highly intelligent real-time simulation configuration can be developed

    Distributed computing methodology for training neural networks in an image-guided diagnostic application

    Get PDF
    Distributed computing is a process through which a set of computers connected by a network is used collectively to solve a single problem. In this paper, we propose a distributed computing methodology for training neural networks for the detection of lesions in colonoscopy. Our approach is based on partitioning the training set across multiple processors using a parallel virtual machine. In this way, interconnected computers of varied architectures can be used for the distributed evaluation of the error function and gradient values, and, thus, training neural networks utilizing various learning methods. The proposed methodology has large granularity and low synchronization, and has been implemented and tested. Our results indicate that the parallel virtual machine implementation of the training algorithms developed leads to considerable speedup, especially when large network architectures and training sets are used

    A First Step Towards Automatically Building Network Representations

    Get PDF
    To fully harness Grids, users or middlewares must have some knowledge on the topology of the platform interconnection network. As such knowledge is usually not available, one must uses tools which automatically build a topological network model through some measurements. In this article, we define a methodology to assess the quality of these network model building tools, and we apply this methodology to representatives of the main classes of model builders and to two new algorithms. We show that none of the main existing techniques build models that enable to accurately predict the running time of simple application kernels for actual platforms. However some of the new algorithms we propose give excellent results in a wide range of situations

    Dynamic Voltage Scaling Techniques for Energy Efficient Synchronized Sensor Network Design

    Get PDF
    Building energy-efficient systems is one of the principal challenges in wireless sensor networks. Dynamic voltage scaling (DVS), a technique to reduce energy consumption by varying the CPU frequency on the fly, has been widely used in other settings to accomplish this goal. In this paper, we show that changing the CPU frequency can affect timekeeping functionality of some sensor platforms. This phenomenon can cause an unacceptable loss of time synchronization in networks that require tight synchrony over extended periods, thus preventing all existing DVS techniques from being applied. We present a method for reducing energy consumption in sensor networks via DVS, while minimizing the impact of CPU frequency switching on time synchronization. The system is implemented and evaluated on a network of 11 Imote2 sensors mounted on a truss bridge and running a high-fidelity continuous structural health monitoring application. Experimental measurements confirm that the algorithm significantly reduces network energy consumption over the same network that does not use DVS, while requiring significantly fewer re-synchronization actions than a classic DVS algorithm.unpublishedis peer reviewe

    Real-time and fault tolerance in distributed control software

    Get PDF
    Closed loop control systems typically contain multitude of spatially distributed sensors and actuators operated simultaneously. So those systems are parallel and distributed in their essence. But mapping this parallelism onto the given distributed hardware architecture, brings in some additional requirements: safe multithreading, optimal process allocation, real-time scheduling of bus and network resources. Nowadays, fault tolerance methods and fast even online reconfiguration are becoming increasingly important. All those often conflicting requirements, make design and implementation of real-time distributed control systems an extremely difficult task, that requires substantial knowledge in several areas of control and computer science. Although many design methods have been proposed so far, none of them had succeeded to cover all important aspects of the problem at hand. [1] Continuous increase of production in embedded market, makes a simple and natural design methodology for real-time systems needed more then ever
    corecore