Search CORE

73 research outputs found

A Configurable Shared Scratchpad Memory for GPU-like Processors

Author: Cilardo A.
Donnarumma C.
Gagliardi M.
Publication venue
Publication date
Field of study

During the last years Field Programmable Gate Arrays and Graphics Processing Units have become increasingly important for high-performance computing. In particular, a number of industrial solutions and academic projects are proposing design frameworks based on FPGA-implemented GPU-like compute units. Existing GPU-like core projects provide limited hardware support for shared scratch-pad memory and particularly for the problem of bank conflicts, a major source of performance loss with many parallel kernels. In this paper, we present a configurable, GPU-like oriented scratchpad memory with built-in support for bank remapping. The core is fully synthetizable on FPGA with a contained hardware cost. We also validated the presented architecture with a cycle-accurate event-driven emulator written in C++ as well as an RTL simulator tool. Last, we demonstrated the impact of bank remapping and other parameters available with the proposed configurable shared scratchpad memory by evaluating the performance of two real-world parallelized kernels

Università degli Studi di Napoli Federico Il Open Archive

Simulation and implementation of novel deep learning hardware architectures for resource constrained devices

Author: Lammie Corey
Publication venue
Publication date: 01/01/2022
Field of study

Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems

ResearchOnline at James Cook University

Reconfigurable Architectures and Systems for IoT Applications

Author
Publication venue
Publication date: 01/01/2016
Field of study

abstract: Internet of Things (IoT) has become a popular topic in industry over the recent years, which describes an ecosystem of internet-connected devices or things that enrich the everyday life by improving our productivity and efficiency. The primary components of the IoT ecosystem are hardware, software and services. While the software and services of IoT system focus on data collection and processing to make decisions, the underlying hardware is responsible for sensing the information, preprocess and transmit it to the servers. Since the IoT ecosystem is still in infancy, there is a great need for rapid prototyping platforms that would help accelerate the hardware design process. However, depending on the target IoT application, different sensors are required to sense the signals such as heart-rate, temperature, pressure, acceleration, etc., and there is a great need for reconfigurable platforms that can prototype different sensor interfacing circuits. This thesis primarily focuses on two important hardware aspects of an IoT system: (a) an FPAA based reconfigurable sensing front-end system and (b) an FPGA based reconfigurable processing system. To enable reconfiguration capability for any sensor type, Programmable ANalog Device Array (PANDA), a transistor-level analog reconfigurable platform is proposed. CAD tools required for implementation of front-end circuits on the platform are also developed. To demonstrate the capability of the platform on silicon, a small-scale array of 24×25 PANDA cells is fabricated in 65nm technology. Several analog circuit building blocks including amplifiers, bias circuits and filters are prototyped on the platform, which demonstrates the effectiveness of the platform for rapid prototyping IoT sensor interfaces. IoT systems typically use machine learning algorithms that run on the servers to process the data in order to make decisions. Recently, embedded processors are being used to preprocess the data at the energy-constrained sensor node or at IoT gateway, which saves considerable energy for transmission and bandwidth. Using conventional CPU based systems for implementing the machine learning algorithms is not energy-efficient. Hence an FPGA based hardware accelerator is proposed and an optimization methodology is developed to maximize throughput of any convolutional neural network (CNN) based machine learning algorithm on a resource-constrained FPGA.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Can DSP48A1 adders be used for high-resolution delay generation?

Author: Arabul Ekin
Dahnoun Naim
Mehmood Shahid
Tancock Scott
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2018
Field of study

Crossref

Explore Bristol Research

Experimental evaluation of neutron-induced errors on a multicore RISC-V platform

Author: Fernandes dos Santos Fernando
Kritikakou Angeliki
Sentieys Olivier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2022
Field of study

Paper accepted on 28th IEEE IOLTS 2022International audienceRISC-V architectures have gained importance in the last years due to their flexibility and open-source Instruction Set Architecture (ISA), allowing developers to efficiently adopt RISC-V processors in several domains with a reduced cost. For application domains, such as safety-critical and mission-critical, the execution must be reliable as a fault can compromise the system's ability to operate correctly. However, the application's error rate on RISC-V processors is not significantly evaluated, as it has been done for standard x86 processors. In this work, we investigate the error rate of a commercial RISC-V ASIC platform, the GAP8, exposed to a neutron beam. We show that for computing-intensive applications, such as classification Convolutional Neural Networks (CNN), the error rate can be 3.2× higher than the average error rate. Additionally, we find that the majority (96.12%) of the errors on the CNN do not generate misclassifications. Finally, we also evaluate the events that cause application interruption on GAP8 and show that the major source of incorrect interruptions is application hangs (i.g., due to an infinite loop or a racing condition)

INRIA a CCSD electronic archive server

funcX: A Federated Function Serving Fabric for Science

Author: Akkus I. E.
CMS
Forde J.
Fox G.
Hightower K.
Hindman B.
Malawski M.
Merkel D.
Spillner J.
Stubbs J.
Wang L.
Waterman D. G.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/05/2020
Field of study

Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap with arXiv:1908.0490

arXiv.org e-Print Archive

Crossref

Acceleration of Biomolecular Simulations using FPGA-based Reconfigurable Computing

Author: Nallamuthu Ananth
Publication venue: Clemson University Libraries
Publication date: 01/05/2010
Field of study

A paradigm shift is occurring in the way compute-intensive scientific applications are developed. Thanks to advancements in commercially viable hybrid architectures for High-Performance Computing (HPC), the focus has shifted from improving performance by merely scaling algorithms on von Neumann computing nodes to fully exploiting additional computational capabilities provided by accelerators such as FPGAs (Field Programmable Gate Arrays) and GPGPUs (General Purpose Graphical Processing Units). Computational chemists use Molecular Dynamics (MD) simulations like LAMMPS (Large Scale Atomic Molecular Massively Parallel Systems) and NAMD (NAnoscale Molecular Dynamics) to simulate biomolecular behaviour such as protein folding and small molecule docking to proteins. MD simulations are computationally complex n-body problems, which are time consuming to simulate in biologically relevant scales. Executing such simulations in best available HPC environments is critical for scientific advancements in the field. Thus, as HPC technology evolves, there is a need to update classical biomolecular simulation applications like LAMMPS to better suit the architecture. In this work, we modify LAMMPS (a classical molecular dynamics simulation program developed for CPU-only clusters) to execute on a reconfigurable computer system, SRC-7 H MAP. The SRC-7 H MAP consists of two Altera FPGA logic chips interfaced to a dual-core Intel Xeon processor. Users can benefit by offloading most compute-intensive tasks of the application to the FPGA logic. This work explores the challenges involved in effectively adapting a production level application code optimized for von Neumann architecture, to an FPGA-based hybrid architecture. We have successfully accelerated the non-bonded force computations, the most compute-intensive module in LAMMPS for biomolecular simulations, by 5.0x over a single 3.0 GHz Xeon processor. This performance includes the data transfer overheads and function calling overheads. Further, using the accelerated non-bonded force computations function, we achieve an overall application speed-up of 2.0x to 2.4

Clemson University: TigerPrints

Open Hardware Solutions in Quantum Technology

Author: Almudever Carmen G.
Bourdeauducq Sébastien
Butko Anastasiia
Cancelo Gustavo
Clark Susan M.
Heinsoo Johannes
Henriet Loïc
Huang Gang
Jurczak Christophe
Kotilahti Janne
Landra Alessandro
LaRose Ryan
Mari Andrea
Nowrouzi Kasra
Ockeloen-Korppi Caspar
Prawiroatmodjo Guen
Roy Anurag Saha
Shammah Nathan
Siddiqi Irfan
Zeng William J.
Publication venue
Publication date: 01/03/2024
Field of study

Quantum technologies such as communications, computing, and sensing offer vast opportunities for advanced research and development. While an open-source ethos currently exists within some quantum technologies, especially in quantum computer programming, we argue that there are additional advantages in developing open quantum hardware (OQH). Open quantum hardware encompasses open-source software for the control of quantum devices in labs, blueprints and open-source toolkits for chip design and other hardware components, as well as openly-accessible testbeds and facilities that allow cloud-access to a wider scientific community. We provide an overview of current projects in the OQH ecosystem, identify gaps, and make recommendations on how to close them today. More open quantum hardware would accelerate technology transfer to and growth of the quantum industry and increase accessibility in science.Comment: 22 pages, 5 figure

arXiv.org e-Print Archive