Search CORE

517 research outputs found

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

In recent years, the field of Deep Learning has seen many disruptive and impactful advancements. Given the increasing complexity of deep neural networks, the need for efficient hardware accelerators has become more and more pressing to design heterogeneous HPC platforms. The design of Deep Learning accelerators requires a multidisciplinary approach, combining expertise from several areas, spanning from computer architecture to approximate computing, computational models, and machine learning algorithms. Several methodologies and tools have been proposed to design accelerators for Deep Learning, including hardware-software co-design approaches, high-level synthesis methods, specific customized compilers, and methodologies for design space exploration, modeling, and simulation. These methodologies aim to maximize the exploitable parallelism and minimize data movement to achieve high performance and energy efficiency. This survey provides a holistic review of the most influential design methodologies and EDA tools proposed in recent years to implement Deep Learning accelerators, offering the reader a wide perspective in this rapidly evolving field. In particular, this work complements the previous survey proposed by the same authors in [203], which focuses on Deep Learning hardware accelerators for heterogeneous HPC platforms

arXiv.org e-Print Archive

Parallel architectural skeletons, re-usable building blocks for parallel applications

Author: Goswami Dhrubajyoti
Publication venue: 'University of Waterloo'
Publication date: 01/01/2001
Field of study

University of Waterloo's Institutional Repository

Automatic pipelining and vectorization of scientific code for FPGAs

Author: Nabi Syed Waqar
Vanderbauwhede Wim
Publication venue: 'Hindawi Limited'
Publication date: 18/11/2019
Field of study

There is a large body of legacy scientific code in use today that could benefit from execution on accelerator devices like GPUs and FPGAs. Manual translation of such legacy code into device-specific parallel code requires significant manual effort and is a major obstacle to wider FPGA adoption. We are developing an automated optimizing compiler TyTra to overcome this obstacle. The TyTra flow aims to compile legacy Fortran code automatically for FPGA-based acceleration, while applying suitable optimizations. We present the flow with a focus on two key optimizations, automatic pipelining and vectorization. Our compiler frontend extracts patterns from legacy Fortran code that can be pipelined and vectorized. The backend first creates fine and coarse-grained pipelines and then automatically vectorizes both the memory access and the datapath based on a cost model, generating an OpenCL-HDL hybrid working solution for FPGA targets on the Amazon cloud. Our results show up to 4.2× performance improvement over baseline OpenCL code

Enlighten

A NOVEL SOFTWARE-BUILT PARALLEL MACHINES AND THEIR INTERCONNECTIONS

Author: AJIT SINGH
Alexander Christopher
Browne J. C.
Cole Mure
Darlington J.
Dehne F.
DHRUBAJYOTI GOSWAMI
Gamma Erich
Grama Ananth
HON FUNG LI
Johnson Ralph E.
MOHAMMAD MURSALIN AKON
Quinn Michael J.
Richard Stevens W.
XUEMIN (SHERMAN) SHEN
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date
Field of study

Crossref

Early aspects: aspect-oriented requirements engineering and architecture design

Author: Araújo João
Clements Paul
Moreira Ana
Tekinerdogan Bedir
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2004
Field of study

This paper reports on the third Early Aspects: Aspect-Oriented Requirements Engineering and Architecture Design Workshop, which has been held in Lancaster, UK, on March 21, 2004. The workshop included a presentation session and working sessions in which the particular topics on early aspects were discussed. The primary goal of the workshop was to focus on challenges to defining methodical software development processes for aspects from early on in the software life cycle and explore the potential of proposed methods and techniques to scale up to industrial applications

CiteSeerX

University of Twente Research Information

FT-PAS-A framework for pattern specific fault-tolerance in parallel programming

Author: Jakadeesan Gopinatha
Publication venue
Publication date: 01/01/2009
Field of study

Fault-tolerance is an important requirement for long running parallel applications. Many approaches are discussed in various literatures about providing fault-tolerance for parallel systems. Most of them exhibit one or more of these shortcomings in delivering fault-tolerance: non-specific solution (i.e., the fault-tolerance solution is general), no separation-of-concern (i.e., the application developer's involvement in implementing the fault tolerance is significant) and limited to inbuilt fault-tolerance solution. In this thesis, we propose a different approach to deliver fault-tolerance to the parallel programs using a-priori knowledge about their patterns. Our approach is based on the observation that different patterns require different fault-tolerance techniques (specificity). Consequently, we have contributed by classifying patterns into sub-patterns based on fault-tolerance strategies. Moreover, the core functionalities of these fault-tolerance strategies can be abstracted and pre-implemented generically, independent of a specific application. Thus, the pre-packaged solution separates their implementation details from the application developer (separation-of-concern). One such fault-tolerance model is designed and implemented here to demonstrate our idea. The Fault-Tolerant Parallel Architectural Skeleton (FT-PAS) model implements various fault-tolerance protocols targeted for a collection of (frequently used) patterns in parallel-programming. Fault-tolerance protocol extension is another important contribution of this research. The FT-PAS model provides a set of basic building blocks as part of protocol extension in order to build new fault- tolerance protocols as needed for available patterns. Finally, the usages of the model from the perspective of two user categories (i.e., an application developer and a protocol designer) are illustrated through examples

Concordia University Research Repository

Tools and Models for High Level Parallel and Grid Programming

Author: Dazzi Patrizio
Publication venue
Publication date: 01/01/2015
Field of study

When algorithmic skeletons were first introduced by Cole in late 1980 the idea had an almost immediate success. The skeletal approach has been proved to be effective when application algorithms can be expressed in terms of skeletons composition. However, despite both their effectiveness and the progress made in skeletal systems design and implementation, algorithmic skeletons remain absent from mainstream practice. Cole and other researchers, focused the problem. They recognized the issues affecting skeletal systems and stated a set of principles that have to be tackled in order to make them more effective and to take skeletal programming into the parallel mainstream. In this thesis we propose tools and models for addressing some among the skeletal programming environments issues. We describe three novel approaches aimed at enhancing skeletons based systems from different angles. First, we present a model we conceived that allows algorithmic skeletons customization exploiting the macro data-flow abstraction. Then we present two results about the exploitation of meta-programming techniques for the run-time generation and optimization of macro data-flow graphs. In particular, we show how to generate and how to optimize macro data-flow graphs accordingly both to programmers provided non-functional requirements and to execution platform features. The last result we present are the Behavioural Skeletons, an approach aimed at addressing the limitations of skeletal programming environments when used for the development of component-based Grid applications. We validated all the approaches conducting several test, performed exploiting a set of tools we developed.Comment: PhD Thesis, 2008, IMT Institute for Advanced Studies, Lucca. arXiv admin note: text overlap with arXiv:1002.2722 by other author

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

A generic framework for process execution and secure multi-party transaction authorization

Author: Weigold T.
Weigold T.
Publication venue
Publication date: 01/01/2010
Field of study

Process execution engines are not only an integral part of workflow and business process management systems but are increasingly used to build process-driven applications. In other words, they are potentially used in all kinds of software across all application domains. However, contemporary process engines and workflow systems are unsuitable for use in such diverse application scenarios for several reasons. The main shortcomings can be observed in the areas of interoperability, versatility, and programmability. Therefore, this thesis makes a step away from domain specific, monolithic workflow engines towards generic and versatile process runtime frameworks, which enable integration of process technology into all kinds of software. To achieve this, the idea and corresponding architecture of a generic and embeddable process virtual machine (ePVM), which supports defining process flows along the theoretical foundation of communicating extended finite state machines, are presented. The architecture focuses on the core process functionality such as control flow and state management, monitoring, persistence, and communication, while using JavaScript as a process definition language. This approach leads to a very generic yet easily programmable process framework. A fully functional prototype implementation of the proposed framework is provided along with multiple example applications. Despite the fact that business processes are increasingly automated and controlled by information systems, humans are still involved, directly or indirectly, in many of them. Thus, for process flows involving sensitive transactions, a highly secure authorization scheme supporting asynchronous multi-party transaction authorization must be available within process management systems. Therefore, along with the ePVM framework, this thesis presents a novel approach for secure remote multi-party transaction authentication - the zone trusted information channel (ZTIC). The ZTIC approach uniquely combines multiple desirable properties such as the highest level of security, ease-of-use, mobility, remote administration, and smooth integration with existing infrastructures into one device and method. Extensively evaluating both, the ePVM framework and the ZTIC, this thesis shows that ePVM in combination with the ZTIC approach represents a unique and very powerful framework for building workflow systems and process-driven applications including support for secure multi-party transaction authorization

WestminsterResearch