Search CORE

4,776 research outputs found

MINIMIZATION OF RESOURCE UTILIZATION FOR A REAL-TIME DEPTH-MAP COMPUTATIONAL MODULE ON FPGA

Author: NGO NGO HUY TAN
Publication venue
Publication date: 01/09/2011
Field of study

Depth-map algorithm allows camera system to estimate depth in many applications. The algorithm is computationally intensive and therefore more effective to be implemented on hardware such as the Field Programmable Gate Array (FPGA). However, the recurring issue in FPGA implementation is the resource limitation. The issue is normally resolved by modifying the algorithm. However, the issue can also be addressed by implementing hardware architectures without the need to modify the depth-map algorithm. In this thesis, five different depth-map processor architectures for the sum-of-absolute-difference (SAD) depth-map algorithm on FPGA at real-time were designed and implemented. Two resource minimization techniques were employed to address the resource limitation issues. Resource usage and performance of these architectures were compared. Memory contention and bandwidth constrain were resolved by using self-initiative memory controller, FIFOs and line buffers. Parallel processing was utilized to achieve high processing speed at low clock frequency. Memory-based line buffers were used instead of register-based line buffers to save 62.4% of logic element (LEs) used, but require some additional dedicated memory bits. A proper use of registers to replace repetitive subtractors saves 24.75% of LEs. The system achieves SAD performance of 295 mega pixel disparity per second (MPDS) for the architecture with 640x480 pixel image, 3x3 pixel window size, 32 pixel disparity range and 30 frames per second. The system achieves SAD performance of 590 MPDS for the 64 pixels disparity range architecture. The disparity matching module works at the frequency of 10 MHz and produces one pixel of result every clock cycle. The results are dense disparity images, suitable for high speed, low cost, low power applications

UTPedia

Adapting the interior point method for the solution of LPs on serial, coarse grain parallel and massively parallel computers

Author: Andersen J
Levkovitz R
Mitra G
Tamiz M
Publication venue: Brunel University
Publication date: 01/01/1990
Field of study

In this paper we describe a unified scheme for implementing an interior point algorithm (IPM) over a range of computer architectures. In the inner iteration of the IPM a search direction is computed using Newton's method. Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. The choice of direct and indirect methods for the solution of this system, and the design of data structures to take advantage of serial, coarse grain parallel and massively parallel computer architectures, are considered in detail. We put forward arguments as to why integration of the system within a sparse simplex solver is important and outline how the system is designed to achieve this integration

CiteSeerX

Brunel University Research Archive

Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

Author: Khawam Sami
Publication venue: The University of Edinburgh
Publication date: 01/01/2006
Field of study

Edinburgh Research Archive

Flare: Flexible In-Network Allreduce

Author: Ashkboos Saleh
De Sensi Daniele
Di Girolamo Salvatore
Hoefler Torsten
Li Shigang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/06/2021
Field of study

The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and send them back the aggregated result. However, existing solutions provide limited customization opportunities and might provide suboptimal performance when dealing with custom operators and data types, with sparse data, or when reproducibility of the aggregation is a concern. To deal with these problems, in this work we design a flexible programmable switch by using as a building block PsPIN, a RISC-V architecture implementing the sPIN programming model. We then design, model, and analyze different algorithms for executing the aggregation on this architecture, showing performance improvements compared to state-of-the-art approaches

arXiv.org e-Print Archive

Design of testbed and emulation tools

Author: Flynn M. J.
Lundstrom S. F.
Publication venue
Publication date
Field of study

The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems

NASA Technical Reports Server

Micropipeline controller design and verification with applications in signal processing

Author: Taylor George
Publication venue: The University of Edinburgh
Publication date: 01/01/1998
Field of study

Edinburgh Research Archive