Search CORE

272 research outputs found

Transformations of High-Level Synthesis Codes for High-Performance Computing

Author: Besta Maciej
Hoefler Torsten
Licht Johannes de Fine
Meierhans Simon
Publication venue
Publication date: 29/10/2019
Field of study

Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

arXiv.org e-Print Archive

Repository for Publications and Research Data

Design of Embedded Augmented Reality Systems

Author: Ferrández J. M.
Garrigós J.
Martínez J. J.
Toledo J.
Toledo-Moreo R.
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen

Recommended from our members

An application of formal semantics to student modelling : an investigation in the domain of teaching Prolog

Author: Fung Pat
Publication venue
Publication date: 01/01/1989
Field of study

This thesis reports on research undertaken in an exploration of the use of formal semantics for student modelling in intelligent tutoring systems. The domain chosen was that of tutoring programming languages and within that domain Prolog was selected to be the target language for this exploration. The problem considered is one of how to analyse students' errors at a level which allows diagnosis to be more flexible and meaningful than is possible with the 'mal-rules' and 'bugcatalogue' approach of existing systems. The ideas put forward by Robin Milner [1980] in his Calculus of Communicating Systems (CCS) form the basis of the formalism which is proposed as a solution to this problem. Based on the findings of an empirical investigation, novices' misconceptions of control flow in Prolog was defined as a suitable area in which to explore the application of this solution. A selection of Prolog programs used in that investigation was formally described in terms of CCS. These formal descriptions were used by a production rule system to generate a number of the incomplete or faulty models of Prolog execution which were identified in the first empirical study. In a second empirical study, a machine-analysis tool, designed to be part of a diagnostic tutoring module, used these models to diagnose students' misconceptions of Prolog control flow. This initial application of CCS to student modelling showed that the models of Prolog execution generated by the system could be used successfully to detect students' misunderstandings. Results from the research reported here indicate that the use of formal semantics to model programming languages has a useful contribution to make to the task of student modelling

Open Research Online (The Open University)

Xar-Trek: Run-Time Execution Migration among FPGAs and Heterogeneous-ISA CPUs

Author: Barbalace Antonio
Chuang Ho-Ren
Horta Edson
Olivier Pierre
Philippidis Cesar
Ravindran Binoy
VSathish Naarayanan Rao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/10/2021
Field of study

Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that migrating application execution at run-time across heterogeneous-ISA CPUs can yield significant performance and energy gains, with relatively little programmer effort. However, FPGAs have often been overlooked in that context: hardware acceleration using FPGAs involves statically implementing select application functions, which prohibits dynamic and transparent migration. We present Xar-Trek, a new compiler and run-time software framework that overcomes this limitation. Xar-Trek compiles an application for several CPU ISAs and select application functions for acceleration on an FPGA, allowing execution migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek's run-time monitors server workloads and migrates application functions to an FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a heuristic policy that uses application workload profiles to make scheduling decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64 server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over no-migration baselines

arXiv.org e-Print Archive

Edinburgh Research Explorer

The Developmental Stages of the Acquisition of Arabic By Adult English-speaking Learners: Processability Theory and the Formulaic Language

Author: Oulhaj Abdellatif
Publication venue: UWM Digital Commons
Publication date: 01/12/2015
Field of study

The aim of this study is to look at the developmental stages of the acquisition of Arabic as a foreign language by adult English learners. Processability theory (Pienemann, 1998, 2005) is adopted to investigate in detail whether the acquisition development will follow the hierarchy as stated by PT. The study targeted agreement within seven grammatical structures. The structures belong to three procedural levels of the hierarchy (stages three to five). Six adult learners participated in this study. They were tested via different tasks to elicit data either to support the predictions of PT hierarchy, or to disconfirm it. Two participants produced subject – verb agreement (stage 4) at a higher rate than N-aAdj / N-N agreement (stage 3). Before disconfirming the Prediction of PT hierarchy, the two participants took a second test to make sure the language they produced is processed and not retrieved as a formula. Students were introduced to a set of new vocabulary and were asked to tell a story based on three picture stories. By learning unfamiliar vocabulary in isolation, the two participants applied grammatical relations to combine words together. Data in test 2 showed a decrease in the acquisition rate of S – V agreement. Therefore, confirming the predictions of PT

University of Wisconsin-Milwaukee

A Modular Approach to Adaptive Reactive Streaming Systems

Author: Neely Christopher E.
Publication venue: Scholar Commons
Publication date: 19/05/2012
Field of study

The latest generations of FPGA devices offer large resource counts that provide the headroom to implement large-scale and complex systems. However, there are increasing challenges for the designer, not just because of pure size and complexity, but also in harnessing effectively the flexibility and programmability of the FPGA. A central issue is the need to integrate modules from diverse sources to promote modular design and reuse. Further, the capability to perform dynamic partial reconfiguration (DPR) of FPGA devices means that implemented systems can be made reconfigurable, allowing components to be changed during operation. However, use of DPR typically requires low-level planning of the system implementation, adding to the design challenge. This dissertation presents ReShape: a high-level approach for designing systems by interconnecting modules, which gives a ‘plug and play’ look and feel to the designer, is supported by tools that carry out implementation and verification functions, and is carried through to support system reconfiguration during operation. The emphasis is on the inter-module connections and abstracting the communication patterns that are typical between modules – for example, the streaming of data that is common in many FPGA-based systems, or the reading and writing of data to and from memory modules. ShapeUp is also presented as the static precursor to ReShape. In both, the details of wiring and signaling are hidden from view, via metadata associated with individual modules. ReShape allows system reconfiguration at the module level, by supporting type checking of replacement modules and by managing the overall system implementation, via metadata associated with its FPGA floorplan. The methodology and tools have been implemented in a prototype for a broad domain-specific setting – networking systems – and have been validated on real telecommunications design projects

Scholar Commons - Santa Clara University