Search CORE

185,563 research outputs found

Empowering parallel computing with field programmable gate arrays

Author: D'Hollander Erik
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

After more than 30 years, reconﬁgurable computing has grown from a concept to a mature ﬁeld of science and technology. The cornerstone of this evolution is the ﬁeld programmable gate array, a building block enabling the conﬁguration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural reﬁnements

Ghent University Academic Bibliography

Symbolic Exploration of Transition Hierarchies

Author: Alur Rajeev
Henzinger Thomas A
Rajamani Sriram K
Publication venue: ScholarlyCommons
Publication date: 01/01/1998
Field of study

In formal design verification, successful model checking is typically preceded by a laborious manual process of constructing design abstractions. We present a methodology for partially - and in some cases, fully - bypassing the abstraction process. For this purpose, we provide to the designer abstraction operators which, if used judiciously in the description of a design, structure the corresponding state space hierarchically. This structure can then be exploited by verification tools, and makes possible the automatic and exhaustive exploration of state spaces that would otherwise be out of scope for existing model checkers. Specifically, we present the following contributions: - A temporal abstraction operator that aggregates transitions and hides intermediate steps. Mathematically, our abstraction operator is a function that maps a flat transition system into a two-level hierarchy where each atomic upper-level transition expands into an entire lower-level transition system. For example, an arithmetic operation may expand to a sequence of bit operations. - A BDD-based algorithm for the symbolic exploration of multi-level hierarchies of transition systems. The algorithm traverses a level-n transition by expanding the corresponding level-(n-1) transition system on-the-fly. The level-n successors of a state are determined by computing a level-(n-1) reach set, which is then immediately released from memory. In this fashion, we can exhaustively explore hierarchically structured state spaces whose flat counterparts cause memory overflows. - We experimentally demonstrate the efficiency of our method with three examples - a multiplier, a cache coherence protocol, and a multiprocessor system. In the first two examples, we obtain significant improvements in run times and peak BDD sizes over traditional state-space search. The third example cannot be model checked at all using conventional methods (without manual abstractions), but can be analyzed fully automatically using transition hierarchies

CiteSeerX

ScholarlyCommons@Penn

Neural Network-Based Thermal Simulation of Integrated Circuits on GPUs

Author: Atienza Alonso David
Ruggiero Martino
Sridhar Arvind
Vincenzi Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 27/10/2011
Field of study

With the rising challenges in heat removal in integrated circuits (ICs), the development of thermal-aware computing architectures and run-time management systems have become indispensable to the continuation of IC design scaling. These thermal-aware design technologies of the future strongly depend on the availability of efficient and accurate means for thermal modeling and analysis. These thermal models must have not only the sufficient accuracy to capture the complex mechanisms that regulate thermal diffusion in ICs, but also a level of abstraction that allows for their fast execution for design space exploration. In this paper, we propose an innovative thermal modeling approach for full-chips that can handle the scalability problem of transient heat flow simulation in large 2D/3D multi-processor ICs. This is achieved by parallelizing the computation-intensive task of transient temperature tracking using neural networks and exploiting the computational power of massively parallel graphics processing units (GPUs). Our results show up to 35x run-time speed-up compared to state-of-the-art IC thermal simulation tools while keeping the error lower than 1ºC. Speed-ups scale with the size of the 3D multi-processor ICs and our proposed method serves as a valuable design space exploration tool

Infoscience - École polytechnique fédérale de Lausanne

EXPLORING MULTIPLE LEVELS OF PERFORMANCE MODELING FOR HETEROGENEOUS SYSTEMS

Author: Pallipuram Krishnamani Venkittaraman Vivek
Publication venue: Clemson University Libraries
Publication date: 01/12/2013
Field of study

The current trend in High-Performance Computing (HPC) is to extract concurrency from clusters that include heterogeneous resources such as General Purpose Graphical Processing Units (GPGPUs) and Field Programmable Gate Array (FPGAs). Although these heterogeneous systems can provide substantial performance for massively parallel applications, much of the available computing resources are often under-utilized due to inefficient application mapping, load balancing, and tuning. While several performance prediction models exist to efficiently tune applications, they often require significant computing architecture knowledge for reliable prediction. In addition, they do not address multiple levels of design space abstraction and it is often difficult to choose a reliable prediction model for a given design. In this research, we develop a multi-level suite of performance prediction models for heterogeneous systems that primarily targets Synchronous Iterative Algorithms (SIAs). The modeling suite aims to produce accurate and straightforward application runtime prediction prior to the actual large-scale implementation. This suite addresses two levels of system abstraction: 1) low-level where partial knowledge of the application implementation is present along with the system specifications and 2) high-level where the implementation details are minimum and only high-level computing system specifications are given. The performance prediction modeling suite is developed using our proposed Synchronous Iterative GPGPU Execution (SIGE) model for GPGPU clusters, motivated by the RC Amenability Test for Scalable Systems (RATSS) model for FPGA clusters. The low-level abstraction for GPGPU clusters consists of a regression-based performance prediction framework that statistically abstracts system architecture characteristics, enabling performance prediction without detailed architecture knowledge. In this framework, the overall execution time of an application is predicted using regression models developed for host-device computations and network-level communications performed in the algorithm. We have used a family of Spiking Neural Network (SNN) models and an Anisotropic Diffusion Filter (ADF) algorithm as SIA case studies for verification of the regression-based framework and achieved over 90% prediction accuracy compared to the actual implementations for several GPGPU cluster configurations tested. The results establish the adequacy of the low-level abstraction model for advanced, fine-grained performance prediction and design space exploration (DSE). The high-level abstraction consists of the following two primary modeling approaches: qualitative modeling that uses existing subjective-analytical models for computation and communication; and quantitative modeling that predicts computation and communication performance by measuring hardware events associated with objective-analytical models using micro-benchmarks. The performance prediction provided by the high-level abstraction approaches, albeit coarse-grained, delivers useful insight into application performance on the chosen heterogeneous system. A blend of the two high-level modeling approaches, labeled as hybrid modeling, is explored for insightful preliminary performance prediction. The performance prediction models in the multi-level suite are verified and compared for their accuracy and ease-of-use, allowing developers to choose a model that best satisfies their design space abstraction. We also construct a roadmap that guides user from optimal Application-to-Accelerator (A2A) mapping to fine-grained performance prediction, thereby providing a hierarchical approach to optimal application porting on the target heterogeneous system. The end goal of this dissertation research is to offer the HPC community a thorough, non-architecture specific, performance prediction framework in the form of a hierarchical modeling suite that enables them to optimally utilize the heterogeneous resources

Clemson University: TigerPrints

Recommended from our members

Design Space Exploration in Cyber-Physical Systems

Author: Amir Maral
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Cyber physical systems (CPS) integrate a variety of engineering areas such as control, mechanical and computer engineering in a holistic design effort. While interdependencies between the different disciplines are key attributes of CPS design science, little is known about the impact of design decisions of the cyber part on the overall system qualities. To investigate these interdependencies, this paper proposes a simulation-based Design Space Exploration (DSE) framework that considers detailed cyber system parameters such as cache size, bus width, and voltage levels in addition to physical and control parameters of the CPS. We propose an exploration algorithm that surfs the parameter configurations in the cyber physical sub-systems, in order to approximate the Pareto-optimal design points with regards to the trade-os among the design objectives, such as energy consumption and control stability. We apply the proposed framework to a network control system for an inverted-pendulum application. The presented holistic evaluation of the identified Pareto-points reveals the presence of non-trivial trade-os, which are imposed by the control, physical, and detailed cyber parameters. For instance the identified energy and control optimal design points comprise configurations with a wide range of CPU speeds, sample times and cache configuration following non-trivial zig-zag patterns. The proposed framework could identify and manage those trade-os and, as a result, is an imperative rst step to automate the search for superior CSP configurations

eScholarship - University of California

A Survey on Compiler Autotuning using Machine Learning

Author: Ashouri Amir H.
Cavazos John
Killian William
Palermo Gianluca
Silvano Cristina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/09/2018
Field of study

Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

The Hipeac Vision, 2010

Author: Cohen Albert
De Bosschere Koen
De Sutter Bjorn
Duranton Marc
Falsafi Babak
Gaydadjiev Georgi
Katevenis Manolis
Maebe Jonas
Munk Harm
Navarro Nacho
Ramirez Alex
Temam Olivier
Valero Matero
Yehia Sami
Publication venue: HiPEAC
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23