Search CORE

26 research outputs found

High-performance and hardware-aware computing: proceedings of the first International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2708)

Author: Buchty Rainer
Weiß Jan-Philipp
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2008
Field of study

The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach

KITopen

High-performance and hardware-aware computing: proceedings of the second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2711), San Antonio, Texas, USA, February 2011 ; (in conjunction with HPCA-17)

Author: Buchty Rainer
Weiß Jan-Philipp
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

High-performance system architectures are increasingly exploiting heterogeneity. The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach

KITopen

A Survey on Hardware-aware and Heterogeneous Computing on Multicore Processors and Accelerators

Author: Buchty Rainer
Heuveline Vincent
Karl Wolfgang
Weiß Jan-Philipp
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2009
Field of study

KITopen

HiFlow3 - A Flexible and Hardware- Aware Parallel Finite Element Package

Author: Anzt Hartwig
Augustin Werner
Baumann Martin
Bockelmann Hendryk
Gengenbach Thomas
Hahn Tobias
Heuveline Vincent
Ketelaer Eva
Lukarski Dimitar
Otzen Andrea
Ritterbusch Sebastian
Rocker Björn
Ronnas Staffan
Schick Michael
Subramanian Chandramowli
Weiss Jan-Philipp
Wilhelm Florian
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2010
Field of study

KITopen

Energy Efficiency of Mixed Precision Iterative Refinement Methods using Hybrid Hardware Platforms: An Evaluation of different Solver and Hardware Configurations

Author: Anzt Hartwig
Heuveline Vincent
Rocker Björn
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2010
Field of study

KITopen

Traffic Prediction for NoCs using Fuzzy Logic

Author: Juurlink Ben
Thomas Gervin
Tutsch Dietmar
Publication venue
Publication date: 01/01/2011
Field of study

Proceedings DOI: 10.5445/KSP/1000021732 (https://doi.org/10.5445/KSP/1000021732)Networks on Chip provide faster communication and higher throughput for chip multiprocessor systems than conventional bus systems. Having multiple processing elements on one chip, however, leads to a large number of message transfers in the NoC. The consequence is that more blocking occurs and time and power is wasted with waiting until the congestion is dissolved. With knowledge of future communication patterns, blocking could be avoided. Therefore, in this paper a model is introduced to predict future communication patterns to avoid network congestion. Our model uses a fuzzy based algorithm to predict end-to-end communication. The presented model accurately predictions for up to 10 time intervals for continuous patterns. Communication patterns with non-continuous behaviors, such as fast changes from peak to zero, can also be predicted accurately for the next 1 to 2 time intervals to come. The model is a first step to predict future communication patterns. In addition, some limitations are identified that must be solved in order to improve the model

DepositOnce

A generic implementation of a quantified predictor on FPGAs

Author: Elhossini Ahmed
Juurlink Ben
Thomas Gervin
Publication venue
Publication date: 01/01/2014
Field of study

Predictors are used in many fields of computer architectures to enhance performance. With good estimations of future system behaviour, policies can be developed to improve system performance or reduce power consumption. These policies become more effective if the predictors are implemented in hardware and can provide quantified forecasts and not only binary ones. In this paper, we present and evaluate a generic predictor implemented in VHDL running on an FPGA which produces quantified forecasts. Moreover, a complete scalability analysis is presented which shows that our implementation has a maximum device utilization of less than 5%. Furthermore, we analyse the power consumption of the predictor running on an FPGA. Additionally, we show that this implementation can be clocked by over 210 MHz. Finally, we evaluate a power-saving policy based on our hardware predictor. Based on predicted idle periods, this power-saving policy uses power-saving modes and is able to reduce memory power consumption by 14.3%

DepositOnce

Crossref

Power Consumption of Mixed Precision in the Iterative Solution of Sparse Linear Systems

Author: Anzt Hartwig
Castillo Maribel
Fernández Juan C.
Heuveline Vincent
Mayo Rafael
Quintana-Orti Enrique S.
Rocker Björn
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2011
Field of study

KITopen

A predictor-based power-saving policy for DRAM memories

Author: Chandrasekar Karthik
Goossens Kees
Juurlink Ben
Thomas Gervin
Åkesson Benny
Publication venue
Publication date: 01/01/2012
Field of study

Reducing power/energy consumption is an important goal for all computer systems, from servers to battery-driven hand-held devices. To achieve this goal, the energy consumption of all system components needs to be reduced. One of the most power-hungry components is the off-chip DRAM, even when it is idle. DRAMs support different power-saving modes, such as self-refresh and power-down, but employing them every time the DRAM is idle, reduces performance due to their power-up latencies. The self-refresh mode offers large power savings, but incurs a long power-up latency. The power-down mode, on the other hand, has a shorter power-up latency, but provides lower power savings. In this paper, we propose and evaluate a novel power-saving policy that combines the best of both power-saving modes in order to achieve significant power reductions with a marginal performance penalty. To accomplish this, we use a history-based predictor to forecast the duration of an idle period and then either employ self-refresh, or power-down, or a combination of both power saving modes. Significant refinements are made to the predictor to maximize the energy savings and minimize the performance penalty. The presented policy is evaluated using several applications from the multimedia domain and the experimental results show that it reduces the total DRAM energy consumption between 68.8% and 79.9% at a negligible performance penalty between 0.3% and 2.2%

DepositOnce

Crossref

Indexed dependence metadata and its applications in software performance optimisation

Author: Howes Lee William
Howes Lee William
Publication venue: Computing, Imperial College London
Publication date: 01/04/2010
Field of study

To achieve continued performance improvements, modern microprocessor design is tending to concentrate an increasing proportion of hardware on computation units with less automatic management of data movement and extraction of parallelism. As a result, architectures increasingly include multiple computation cores and complicated, software-managed memory hierarchies. Compilers have difficulty characterizing the behaviour of a kernel in a general enough manner to enable automatic generation of efficient code in any but the most straightforward of cases. We propose the concept of indexed dependence metadata to improve application development and mapping onto such architectures. The metadata represent both the iteration space of a kernel and the mapping of that iteration space from a given index to the set of data elements that iteration might use: thus the dependence metadata is indexed by the kernel’s iteration space. This explicit mapping allows the compiler or runtime to optimise the program more efficiently, and improves the program structure for the developer. We argue that this form of explicit interface specification reduces the need for premature, architecture-specific optimisation. It improves program portability, supports intercomponent optimisation and enables generation of efficient data movement code. We offer the following contributions: an introduction to the concept of indexed dependence metadata as a generalisation of stream programming, a demonstration of its advantages in a component programming system, the decoupled access/execute model for C++ programs, and how indexed dependence metadata might be used to improve the programming model for GPU-based designs. Our experimental results with prototype implementations show that indexed dependence metadata supports automatic synthesis of double-buffered data movement for the Cell processor and enables aggressive loop fusion optimisations in image processing, linear algebra and multigrid application case studies

Spiral - Imperial College Digital Repository