6 research outputs found

    FPGA technology in process tomography

    Get PDF
    The aims of this paper are to provide a review of the process tomography applications employing field programmable gate arrays (FPGA) and to understand current FPGA related researches, in order to seek for the possibility to applied FPGA technology in an ultrasonic process tomography system. FPGA allows users to implement complete systems on a programmable chip, meanwhile, five main benefits of applying the FPGA technology are performance, time to market, cost, reliability, and long-term maintenance. These advantages definitely could help in the revolution of process tomography, especially for ultrasonic process tomography and electrical process tomography. Future work is focused on the ultrasonic process tomography for chemical process column investigation using FPGA for the aspects of low cost, high speed and reconstructed image quality

    Batch Processing of Biomedical Calculations on a Rivyera Supercomputer

    Get PDF
    Práce pojednává o problematice dávkového zpracování dat na hardwarové architektuře hradlových polí zastoupené superpočítačem Rivyera, jehož výpočetní výkon je zprostředkován několika desítkami programovatelných hradlových polí. Cílem práce je navrhnout a realizovat řešení vzdálené obsluhy a dávkové zpracování dat na superpočítači Rivyera. A dále také poskytnout postup pro vývoj a řešení výpočtů s použitím paralelního přístupu na superpočítači Rivyera. Výpočetní postupy jsou řešeny kombinací výpočetního jádra na programovatelných hradlových polích, popsaných v jazyce VHDL, a obslužné hostitelské aplikace v programovacím jazyce Java. Ukázkou výpočetních postupů s použitím superpočítače Rivyera je paralelní zpracování CT obrazů do podoby digitálně rekonstruovaného rentgenového snímku z určité směrové projekce.This thesis describes batch processing of data at hardware architecture of gate arrays represented by Rivyera supercomputer, its computational power is mediated by several dozens of field programmable gate arrays. The aim of thesis is to design and implement solution of remote control and batch processing of data at Rivyera supercomputer and also to provide a method to development and resolution of calculation by using parallel computing of Rivyera supercomputer. Computational methods are solved by combining of computational core of field programmable gate arrays, described using VHDL language, and handler host application, written in Java programming language. An example of computational methods using Rivyera supercomputer is parallel processing of CT images into digitally reconstructed radiographs from a certain directional projection.450 - Katedra kybernetiky a biomedicínského inženýrstvívýborn

    FPGA-Based Acceleration of Expectation Maximization Algorithm using High Level Synthesis

    Get PDF
    Expectation Maximization (EM) is a soft clustering algorithm which partitions data iteratively into M clusters. It is one of the most popular data mining algorithms that uses Gaussian Mixture Models (GMM) for probability density modeling and is widely used in applications such as signal processing and Machine Learning (ML). EM requires high computation time and large amount of memory when dealing with large data sets. Conventionally, the HDL-based design methodology is used to program FPGAs for accelerating computationally intensive algorithms. In many real world applications, FPGA provide great speedup along with lower power consumption compared to multicore CPUs and GPUs. Intel FPGA SDK for OpenCL enables developers with no hardware knowledge to program the FPGAs with short development time. This thesis presents an optimized implementation of EM algorithm on Stratix V and Arria 10 FPGAs using Intel FPGA SDK for OpenCL. Comparison of performance and power consumption between CPU, GPU and FPGA is presented for various dimension and cluster sizes. Compared to an Intel(R) Xeon(R) CPU E5-2637 our fully optimized OpenCL model for EM targeting Arria 10 FPGA achieved up to 1000X speedup in terms of throughput (Tspeedup) and 5395X speedup in terms of throughput per unit of power consumed (T/Pspeedup). Compared to previous research on EM-GMM implementation on GPUs, Arria 10 FPGA obtained up to 64.74X Tspeedup and 486.78X T/Pspeedup

    Application-Specific Memory Subsystems

    Get PDF
    The disparity in performance between processors and main memories has led computer architects to incorporate large cache hierarchies in modern computers. These cache hierarchies are designed to be general-purpose in that they strive to provide the best possible performance across a wide range of applications. However, such a memory subsystem does not necessarily provide the best possible performance for a particular application. Although general-purpose memory subsystems are desirable when the work-load is unknown and the memory subsystem must remain fixed, when this is not the case a custom memory subsystem may be beneficial. For example, in an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) designed to run a particular application, a custom memory subsystem optimized for that application would be desirable. In addition, when there are tunable parameters in the memory subsystem, it may make sense to change these parameters depending on the application being run. Such a situation arises today with FPGAs and, to a lesser extent, GPUs, and it is plausible that general-purpose computers will begin to support greater flexibility in the memory subsystem in the future. In this dissertation, we first show that it is possible to create application-specific memory subsystems that provide much better performance than a general-purpose memory subsystem. In addition, we show a way to discover such memory subsystems automatically using a superoptimization technique on memory address traces gathered from applications. This allows one to generate a custom memory subsystem with little effort. We next show that our memory subsystem superoptimization technique can be used to optimize for objectives other than performance. As an example, we show that it is possible to reduce the number of writes to the main memory, which can be useful for main memories with limited write durability, such as flash or Phase-Change Memory (PCM). Finally, we show how to superoptimize memory subsystems for streaming applications, which are a class of parallel applications. In particular, we show that, through the use of ScalaPipe, we can author and deploy streaming applications targeting FPGAs with superoptimized memory subsystems. ScalaPipe is a domain-specific language (DSL) embedded in the Scala programming language for generating streaming applications that can be implemented on CPUs and FPGAs. Using the ScalaPipe implementation, we are able to demonstrate actual performance improvements using the superoptimized memory subsystem with applications implemented in hardware

    FPGA Implementation of EM Algorithm for 3D CT Reconstruction

    No full text
    corecore