Search CORE

4 research outputs found

An embedded system supporting dynamic partial reconfiguration of hardware resources for morphological image processing

Author: Sahu Gyana Ranjan
Publication venue: Digital Commons @ NJIT
Publication date: 31/01/2015
Field of study

Processors for high-performance computing applications are generally designed with a focus on high clock rates, parallelism of operations and high communication bandwidth, often at the expense of large power consumption. However, the emphasis of many embedded systems and untethered devices is on minimal hardware requirements and reduced power consumption. With the incessant growth of computational needs for embedded applications, which contradict chip power and area needs, the burden is put on the hardware designers to come up with designs that optimize power and area requirements. This thesis investigates the efficient design of an embedded system for morphological image processing applications on Xilinx FPGAs (Field Programmable Gate Array) by optimizing both area and power usage while delivering high performance. The design leverages a unique capability of FPGAs called dynamic partial reconfiguration (DPR) which allows changing the hardware configuration of silicon pieces at runtime. DPR allows regions of the FPGA to be reprogrammed with new functionality while applications are still running in the remainder of the device. The main aim of this thesis is to design an embedded system for morphological image processing by accounting for real time and area constraints as compared to a statically configured FPGA. IP (Intellectual Property) cores are synthesized for both static and dynamic time. DPR enables instantiation of more hardware logic over a period of time on an existing device by time-multiplexing the hardware realization of functions. A comparison of power consumption is presented for the statically and dynamically reconfigured designs. Finally, a performance comparison is included for the implementation of the respective algorithms on a hardwired ARM processor as well as on another general-purpose processor. The results prove the viability of DPR for morphological image processing applications

Digital Commons @ New Jersey Institute of Technology (NJIT)

Diseño electrónico en FPGA con herramientas de software libre

Author: Ríos Santillán Iván
Publication venue
Publication date: 01/01/2019
Field of study

¿Cualquiera puede utilizar una FPGA? ¿Es asequible? Todos los proveedores de estos dispositivos obligan a utilizar su propio software, normalmente privativo y bajo licencia. El Proyecto IceStorm incluye varias herramientas FOSS para desarrollar todo el proceso, desde la compilación del diseño hasta su carga en chip, en ciertos modelos de FPGA. Gracias a esto, prácticamente cualquier usuario con cierta idea de programar es capaz de utilizarlas con el único desembolso de la compra de una de estas FPGA. En este trabajo se dará una completa explicación de su instalación, su uso y varios ejemplos de todo ese proceso utilizando únicamente las herramientas del “Proyecto IceStorm”.Can anyone use an FPGA? Is it affordable? All providers of these devices require you to use their own software, normally proprietary and licensed. The “Project IceStorm” includes several FOSS tools to develop the entire process, from design compiling to download on chip, in certain FPGA models. Thanks to this, practically any user with basic programming skills would be able to use them, reducing costs to just the acquisition of one of these FPGAs This paper will give a complete explanation of its installation, its use and several examples of this whole process using only the tools of the “Project IceStorm”.Grado en Ingeniería en Electrónica y Automática Industria

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Vector processor virtualization: distributed memory hierarchy and simultaneous multithreading

Author: Rooholamin SeyedAmin
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2016
Field of study

Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors, along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. On the other hand, these choices turn out to be large resource and energy consumers, while also not being always used efficiently due to data dependencies among instructions and limited portion of vectorizable code in single applications that deploy them. This dissertation proposes an innovative architecture for a multithreaded VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In this multilane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. More importantly, the proposed VP has an innovative multithreaded architecture which makes it highly suitable for concurrent sharing in multicore environments. To this end, the VP which is developed in VHDL and prototyped on an FPGA (Field-Programmable Gate Array), serves as a coprocessor for one or more scalar cores in various system architectures presented in the dissertation. In the first system architecture, the VP is allocated exclusively to a single scalar core. Benchmarking shows that the VP can achieve very high performance. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. In the second system architecture, a VP virtualization technique is presented which, when applied, enables the multithreaded VP to simultaneously execute many threads of various vector lengths. The threads compete simultaneously for the VP resources having as a goal an improved aggregate VP utilization. This approach yields high VP utilization even under low utilization for the individual threads. A vector register file (VRF) virtualization technique dynamically allocates physical vector registers to running threads. The technique is implemented for a multi-core processor embedded in an FPGA. Under the dynamic creation of threads, benchmarking demonstrates large VP speedups and drastic energy savings when compared to the first system architecture. In the last system architecture, further improvements focus on VP virtualization relying exclusively on hardware. Moreover, a pipelined data shuffle network replaces the non-pipelined shuffle engines. The VP can then take advantage of identical instruction flows that may be present in different vector applications by running in a fused instruction mode that increases its utilization. A power dissipation model is introduced as well as two optimization policies towards minimizing the consumed energy, or the product of the energy and runtime for a given application. Benchmarking shows the positive impact of these optimizations

Digital Commons @ New Jersey Institute of Technology (NJIT)

FPGA and ASIC Square Root Designs for High Performance and Power Efficiency

Author: Shashank Suresh
Sotirios G. Ziavras
Spiridon F. Beldianu
Publication venue
Publication date: 02/08/2013
Field of study

Abstract- Floating-point square root is a fundamental operation in signal processing and various HPC applications. Since this is an expensive operation in resource and energy consumption, its efficient implementation should be of priority in future multicores that will face dark silicon issues. This paper presents a low-cost, low-power consumption design to calculate the square root using the IEEE754 single-precision floating-point format. Two versions of the design are investigated with and without clock gating (CG), respectively. Evaluation involves FPGA and ASIC technologies at 40 and 65 nm. Substantial performance growth and reduced power consumption are gained as compared to a popular iterative solution. The ASIC design demonstrates much lower power consumption, which at 40 nm is lower than that at 65 nm by about a threefold. At 40 nm, CG for the ASIC realization is justified primarily for low activity rates. Keywords—FPGA, ASIC, floating-point square root, energy consumption, multicore processors. I

CiteSeerX

Crossref