32 research outputs found

    Iso-energy-efficiency: An approach to power-constrained parallel computation

    Get PDF
    Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

    Amdahl's law for predicting the future of multicores considered harmful

    Get PDF
    Several recent works predict the future of multicore systems or identify scalability bottlenecks based on Amdahl's law. Amdahl's law implicitly assumes, however, that the problem size stays constant, but in most cases more cores are used to solve larger and more complex problems. There is a related law known as Gustafson's law which assumes that runtime, not the problem size, is constant. In other words, it is assumed that the runtime on p cores is the same as the runtime on 1 core and that the parallel part of an application scales linearly with the number of cores. We apply Gustafson's law to symmetric, asymmetric, and dynamic multicores and show that this leads to fundamentally different results than when Amdahl's law is applied. We also generalize Amdahl's and Gustafson's law and study how this quantitatively effects the dimensioning of future multicore systems

    An optimization scheduler in the intranet grid

    Get PDF
    Scheduling of processes is the basic task for a grid computing. This role is responsible for the allocation of time for computational agents. The calculation agent can include a wide range of devices, based on various types of computer systems. Is it possible to efficiently build a grid infrastructure in the company environment. The grid can be used in scientific and technical computing, as well as better load distribution of the individual computing systems and services. The scheduler is a major component of grid computing. The main task is to effectively distribute the load of the system and allocate tasks to places that are not sufficiently utilized at a given moment. The article also focuses on the relation between conflicting parameters, which relate to the quality of the planning process. Time calculation of the optimization algorithm affects the quality of the draft plan. It has a direct impact on the total period of the job processing. In the strategy of the scheduling there is a point where extensions of time have no effect on quality of the draft of the plan but getting worse the overall runtime of the job. The aim was to compare the common metaheuristic algorithms. From the measured values to propose a methodology for determining the optimum time for planning process. © Springer International Publishing Switzerland 2016

    Amdahl's Reliability Law: A Simple Quantification of the Weakest-Link Phenomenon

    Full text link

    A TIME-AND-SPACE PARALLELIZED ALGORITHM FOR THE CABLE EQUATION

    Get PDF
    Electrical propagation in excitable tissue, such as nerve fibers and heart muscle, is described by a nonlinear diffusion-reaction parabolic partial differential equation for the transmembrane voltage V(x,t)V(x,t), known as the cable equation. This equation involves a highly nonlinear source term, representing the total ionic current across the membrane, governed by a Hodgkin-Huxley type ionic model, and requires the solution of a system of ordinary differential equations. Thus, the model consists of a PDE (in 1-, 2- or 3-dimensions) coupled to a system of ODEs, and it is very expensive to solve, especially in 2 and 3 dimensions. In order to solve this equation numerically, we develop an algorithm, extended from the Parareal Algorithm, to efficiently incorporate space-parallelized solvers into the framework of the Parareal algorithm, to achieve time-and-space parallelization. Numerical results and comparison of the performance of several serial, space-parallelized and time-and-space-parallelized time-stepping numerical schemes in one-dimension and in two-dimensions are also presented

    Vector coprocessor sharing techniques for multicores: performance and energy gains

    Get PDF
    Vector Processors (VPs) created the breakthroughs needed for the emergence of computational science many years ago. All commercial computing architectures on the market today contain some form of vector or SIMD processing. Many high-performance and embedded applications, often dealing with streams of data, cannot efficiently utilize dedicated vector processors for various reasons: limited percentage of sustained vector code due to substantial flow control; inherent small parallelism or the frequent involvement of operating system tasks; varying vector length across applications or within a single application; data dependencies within short sequences of instructions, a problem further exacerbated without loop unrolling or other compiler optimization techniques. Additionally, existing rigid SIMD architectures cannot tolerate efficiently dynamic application environments with many cores that may require the runtime adjustment of assigned vector resources in order to operate at desired energy/performance levels. To simultaneously alleviate these drawbacks of rigid lane-based VP architectures, while also releasing on-chip real estate for other important design choices, the first part of this research proposes three architectural contexts for the implementation of a shared vector coprocessor in multicore processors. Sharing an expensive resource among multiple cores increases the efficiency of the functional units and the overall system throughput. The second part of the dissertation regards the evaluation and characterization of the three proposed shared vector architectures from the performance and power perspectives on an FPGA (Field-Programmable Gate Array) prototype. The third part of this work introduces performance and power estimation models based on observations deduced from the experimental results. The results show the opportunity to adaptively adjust the number of vector lanes assigned to individual cores or processing threads in order to minimize various energy-performance metrics on modern vector- capable multicore processors that run applications with dynamic workloads. Therefore, the fourth part of this research focuses on the development of a fine-to-coarse grain power management technique and a relevant adaptive hardware/software infrastructure which dynamically adjusts the assigned VP resources (number of vector lanes) in order to minimize the energy consumption for applications with dynamic workloads. In order to remove the inherent limitations imposed by FPGA technologies, the fifth part of this work consists of implementing an ASIC (Application Specific Integrated Circuit) version of the shared VP towards precise performance-energy studies involving high- performance vector processing in multicore environments

    Web-based front-end design and scientific computing for material stress simulation software

    Get PDF
    A precise simulation requires a large amount of input data such as geometrical descriptions of the crystal structure, the external forces and loads, and quantitative properties of the material. Although some powerful applications already exist for research purposes, they are not widely used in education due to complex structure and unintuitive operation. To cater to the generic user base, a front-end application for material simulation software is introduced. With a graphic interface, it provides a more efficient way to conduct the simulation and to educate students who want to enlarge knowledge in relevant fields. We first discuss how we explore the solution for the front-end application and how to develop it on top of the material simulation software developed by mechanical engineering lab from Georgia Tech Lorraine. The user interface design, the functionality and the whole user experience are primary factors determining the product success or failure. This material simulation software helps researchers resolve the motion and the interactions of a large ensemble of dislocations for single or multi-layered 3D materials. However, the algorithm it utilizes is not well optimized and parallelized, so its performance of speedup cannot scale when using more CPUs in the cluster. This problem leads to the second topic on scientific computing, so in this thesis we offer different approaches that attempt to improve the parallelization and optimize the scalability.M.S
    corecore