Search CORE

6 research outputs found

FPGA technology in process tomography

Author: Abdul Majid M. S.
Ang Wei Neng V.
Fazalul Rahiman M. H.
Ishak S.
Mat Ali N. A.
Rahim R. A.
Sayyidi Hamzah S.
Tan Wan Kiat T.
Thiam Siow L.
Publication venue: 'Penerbit UTM Press'
Publication date: 01/01/2016
Field of study

The aims of this paper are to provide a review of the process tomography applications employing field programmable gate arrays (FPGA) and to understand current FPGA related researches, in order to seek for the possibility to applied FPGA technology in an ultrasonic process tomography system. FPGA allows users to implement complete systems on a programmable chip, meanwhile, five main benefits of applying the FPGA technology are performance, time to market, cost, reliability, and long-term maintenance. These advantages definitely could help in the revolution of process tomography, especially for ultrasonic process tomography and electrical process tomography. Future work is focused on the ultrasonic process tomography for chemical process column investigation using FPGA for the aspects of low cost, high speed and reconstructed image quality

Universiti Teknologi Malaysia Institutional Repository

Batch Processing of Biomedical Calculations on a Rivyera Supercomputer

Author: Oczka David
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2017
Field of study

Práce pojednává o problematice dávkového zpracování dat na hardwarové architektuře hradlových polí zastoupené superpočítačem Rivyera, jehož výpočetní výkon je zprostředkován několika desítkami programovatelných hradlových polí. Cílem práce je navrhnout a realizovat řešení vzdálené obsluhy a dávkové zpracování dat na superpočítači Rivyera. A dále také poskytnout postup pro vývoj a řešení výpočtů s použitím paralelního přístupu na superpočítači Rivyera. Výpočetní postupy jsou řešeny kombinací výpočetního jádra na programovatelných hradlových polích, popsaných v jazyce VHDL, a obslužné hostitelské aplikace v programovacím jazyce Java. Ukázkou výpočetních postupů s použitím superpočítače Rivyera je paralelní zpracování CT obrazů do podoby digitálně rekonstruovaného rentgenového snímku z určité směrové projekce.This thesis describes batch processing of data at hardware architecture of gate arrays represented by Rivyera supercomputer, its computational power is mediated by several dozens of field programmable gate arrays. The aim of thesis is to design and implement solution of remote control and batch processing of data at Rivyera supercomputer and also to provide a method to development and resolution of calculation by using parallel computing of Rivyera supercomputer. Computational methods are solved by combining of computational core of field programmable gate arrays, described using VHDL language, and handler host application, written in Java programming language. An example of computational methods using Rivyera supercomputer is parallel processing of CT images into digitally reconstructed radiographs from a certain directional projection.450 - Katedra kybernetiky a biomedicínského inženýrstvívýborn

DSpace at VSB Technical University of Ostrava

FPGA-Based Acceleration of Expectation Maximization Algorithm using High Level Synthesis

Author: Momen Mohammad Abdul
Publication venue: 'University of Windsor Leddy Library'
Publication date: 05/10/2017
Field of study

Expectation Maximization (EM) is a soft clustering algorithm which partitions data iteratively into M clusters. It is one of the most popular data mining algorithms that uses Gaussian Mixture Models (GMM) for probability density modeling and is widely used in applications such as signal processing and Machine Learning (ML). EM requires high computation time and large amount of memory when dealing with large data sets. Conventionally, the HDL-based design methodology is used to program FPGAs for accelerating computationally intensive algorithms. In many real world applications, FPGA provide great speedup along with lower power consumption compared to multicore CPUs and GPUs. Intel FPGA SDK for OpenCL enables developers with no hardware knowledge to program the FPGAs with short development time. This thesis presents an optimized implementation of EM algorithm on Stratix V and Arria 10 FPGAs using Intel FPGA SDK for OpenCL. Comparison of performance and power consumption between CPU, GPU and FPGA is presented for various dimension and cluster sizes. Compared to an Intel(R) Xeon(R) CPU E5-2637 our fully optimized OpenCL model for EM targeting Arria 10 FPGA achieved up to 1000X speedup in terms of throughput (Tspeedup) and 5395X speedup in terms of throughput per unit of power consumed (T/Pspeedup). Compared to previous research on EM-GMM implementation on GPUs, Arria 10 FPGA obtained up to 64.74X Tspeedup and 486.78X T/Pspeedup

Scholarship at UWindsor

Application-Specific Memory Subsystems

Author: Wingbermuehle Joseph George
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2015
Field of study

The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware

Washington University St. Louis: Open Scholarship

Recommended from our members

Performance Debugging Frameworks for FPGA High-Level Synthesis

Author: Choi Young-kyu
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Using high-level synthesis (HLS) tools for field-programmable gate array (FPGA) design is becoming an increasingly popular choice because HLS tools can generate a high-quality design in a short development time. However, current HLS tools still cannot adequately support users in understanding and fixing the performance issues of the current design. That is, current HLS tools lack in performance debugging capability. Previous work on performance debugging automates the process of inserting hardware monitors in low-level register-transfer level (RTL) languages which limits the comprehensibility of the obtained result. Instead, our HLS-based flows offer analysis on a function or loop level and provide more intuitive feedback that can be used to pinpoint the performance bottleneck of a design. In this dissertation, we present a collection of HLS-based debugging frameworks for various purposes and characteristics of the design. First, we address the problem in the HLS synthesis step, where an inaccurate cycle estimation is provided if the program has input-dependent behavior. We propose a new performance estimator that automatically instruments code that models the hardware execution behavior and interprets the information from the HLS software simulation. However, the performance estimation result of this flow may not be accurate for a type of designs that cannot be simulated correctly by existing HLS software simulators. To handle such cases, we propose a new software simulator that provides cycle-accurate result based on the HLS scheduling information. If the input dataset is not available for software simulation or high-level models do not exist for all components of the FPGA design, we also present an on-board monitoring flow for automated cycle extraction and stall analysis. Finally, we address the needs of HLS programmers to automatically find the best set of directives for FPGA designs. We propose a design space exploration (DSE) framework to optimize applications with variable loop bounds in Polybench benchmark. A quantitative comparison among the proposed frameworks is shown using the sparse matrix-vector multiplication benchmark

eScholarship - University of California

FPGA Implementation of EM Algorithm for 3D CT Reconstruction

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref