Search CORE

32 research outputs found

Iso-energy-efficiency: An approach to power-constrained parallel computation

Author: Ge Rong
Kirk Cameron
Song Shuaiwen
Su Chunyi
Vishnu Abhinav
Publication venue
Publication date: 01/01/2010
Field of study

Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

Computer Science Technical Reports @Virginia Tech

Amdahl's law for predicting the future of multicores considered harmful

Author: B.H.H. Juurlink
C. H. Meenderinck
Kauranne T.
Meenderinck C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Amdahl's law for predicting the future of multicores considered harmful

Author: Juurlink Ben
Meenderinck Cor
Publication venue
Publication date: 01/01/2012
Field of study

Several recent works predict the future of multicore systems or identify scalability bottlenecks based on Amdahl's law. Amdahl's law implicitly assumes, however, that the problem size stays constant, but in most cases more cores are used to solve larger and more complex problems. There is a related law known as Gustafson's law which assumes that runtime, not the problem size, is constant. In other words, it is assumed that the runtime on p cores is the same as the runtime on 1 core and that the parallel part of an application scales linearly with the number of cores. We apply Gustafson's law to symmetric, asymmetric, and dynamic multicores and show that this leads to fundamentally different results than when Amdahl's law is applied. We also generalize Amdahl's and Gustafson's law and study how this quantitatively effects the dimensioning of future multicore systems

DepositOnce

An optimization scheduler in the intranet grid

Author: Lukašík Petr
Sysel Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/07/2016
Field of study

Scheduling of processes is the basic task for a grid computing. This role is responsible for the allocation of time for computational agents. The calculation agent can include a wide range of devices, based on various types of computer systems. Is it possible to efficiently build a grid infrastructure in the company environment. The grid can be used in scientific and technical computing, as well as better load distribution of the individual computing systems and services. The scheduler is a major component of grid computing. The main task is to effectively distribute the load of the system and allocate tasks to places that are not sufficiently utilized at a given moment. The article also focuses on the relation between conflicting parameters, which relate to the quality of the planning process. Time calculation of the optimization algorithm affects the quality of the draft plan. It has a direct impact on the total period of the job processing. In the strategy of the scheduling there is a point where extensions of time have no effect on quality of the draft of the plan but getting worse the overall runtime of the job. The aim was to compare the common metaheuristic algorithms. From the measured values to propose a methodology for determining the optimum time for planning process. © Springer International Publishing Switzerland 2016

Crossref

Institutional repository of Tomas Bata University Library

Amdahl's Reliability Law: A Simple Quantification of the Weakest-Link Phenomenon

Author: Behrooz Parhami
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A TIME-AND-SPACE PARALLELIZED ALGORITHM FOR THE CABLE EQUATION

Author: Li Chuan
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2011
Field of study

Electrical propagation in excitable tissue, such as nerve fibers and heart muscle, is described by a nonlinear diffusion-reaction parabolic partial differential equation for the transmembrane voltage

V(x,t)

, known as the cable equation. This equation involves a highly nonlinear source term, representing the total ionic current across the membrane, governed by a Hodgkin-Huxley type ionic model, and requires the solution of a system of ordinary differential equations. Thus, the model consists of a PDE (in 1-, 2- or 3-dimensions) coupled to a system of ODEs, and it is very expensive to solve, especially in 2 and 3 dimensions. In order to solve this equation numerically, we develop an algorithm, extended from the Parareal Algorithm, to efficiently incorporate space-parallelized solvers into the framework of the Parareal algorithm, to achieve time-and-space parallelization. Numerical results and comparison of the performance of several serial, space-parallelized and time-and-space-parallelized time-stepping numerical schemes in one-dimension and in two-dimensions are also presented

University of Tennessee, Knoxville: Trace

Vector coprocessor sharing techniques for multicores: performance and energy gains

Author: Beldianu Spiridon Florin
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2012
Field of study

Vector Processors (VPs) created the breakthroughs needed for the emergence of computational science many years ago. All commercial computing architectures on the market today contain some form of vector or SIMD processing. Many high-performance and embedded applications, often dealing with streams of data, cannot efficiently utilize dedicated vector processors for various reasons: limited percentage of sustained vector code due to substantial flow control; inherent small parallelism or the frequent involvement of operating system tasks; varying vector length across applications or within a single application; data dependencies within short sequences of instructions, a problem further exacerbated without loop unrolling or other compiler optimization techniques. Additionally, existing rigid SIMD architectures cannot tolerate efficiently dynamic application environments with many cores that may require the runtime adjustment of assigned vector resources in order to operate at desired energy/performance levels. To simultaneously alleviate these drawbacks of rigid lane-based VP architectures, while also releasing on-chip real estate for other important design choices, the first part of this research proposes three architectural contexts for the implementation of a shared vector coprocessor in multicore processors. Sharing an expensive resource among multiple cores increases the efficiency of the functional units and the overall system throughput. The second part of the dissertation regards the evaluation and characterization of the three proposed shared vector architectures from the performance and power perspectives on an FPGA (Field-Programmable Gate Array) prototype. The third part of this work introduces performance and power estimation models based on observations deduced from the experimental results. The results show the opportunity to adaptively adjust the number of vector lanes assigned to individual cores or processing threads in order to minimize various energy-performance metrics on modern vector- capable multicore processors that run applications with dynamic workloads. Therefore, the fourth part of this research focuses on the development of a fine-to-coarse grain power management technique and a relevant adaptive hardware/software infrastructure which dynamically adjusts the assigned VP resources (number of vector lanes) in order to minimize the energy consumption for applications with dynamic workloads. In order to remove the inherent limitations imposed by FPGA technologies, the fifth part of this work consists of implementing an ASIC (Application Specific Integrated Circuit) version of the shared VP towards precise performance-energy studies involving high- performance vector processing in multicore environments

Digital Commons @ New Jersey Institute of Technology (NJIT)

Web-based front-end design and scientific computing for material stress simulation software

Author: Lin Tien-Ju
Publication venue: Georgia Institute of Technology
Publication date: 12/01/2015
Field of study

A precise simulation requires a large amount of input data such as geometrical descriptions of the crystal structure, the external forces and loads, and quantitative properties of the material. Although some powerful applications already exist for research purposes, they are not widely used in education due to complex structure and unintuitive operation. To cater to the generic user base, a front-end application for material simulation software is introduced. With a graphic interface, it provides a more efficient way to conduct the simulation and to educate students who want to enlarge knowledge in relevant fields. We first discuss how we explore the solution for the front-end application and how to develop it on top of the material simulation software developed by mechanical engineering lab from Georgia Tech Lorraine. The user interface design, the functionality and the whole user experience are primary factors determining the product success or failure. This material simulation software helps researchers resolve the motion and the interactions of a large ensemble of dislocations for single or multi-layered 3D materials. However, the algorithm it utilizes is not well optimized and parallelized, so its performance of speedup cannot scale when using more CPUs in the cluster. This problem leads to the second topic on scientific computing, so in this thesis we offer different approaches that attempt to improve the parallelization and optimize the scalability.M.S

Scholarly Materials And Research @ Georgia Tech