Search CORE

682 research outputs found

A general framework for efficient FPGA implementation of matrix product

Author: Amira A.
Bensaali F.
Sotudeh R.
Publication venue
Publication date: 01/01/2007
Field of study

Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

University of Hertfordshire Research Archive

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

HIGH PERFORMANCE, LOW COST SUBSPACE DECOMPOSITION AND POLYNOMIAL ROOTING FOR REAL TIME DIRECTION OF ARRIVAL ESTIMATION: ANALYSIS AND IMPLEMENTATION

Author: Athi Mrudula V.
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2014
Field of study

This thesis develops high performance real-time signal processing modules for direction of arrival (DOA) estimation for localization systems. It proposes highly parallel algorithms for performing subspace decomposition and polynomial rooting, which are otherwise traditionally implemented using sequential algorithms. The proposed algorithms address the emerging need for real-time localization for a wide range of applications. As the antenna array size increases, the complexity of signal processing algorithms increases, making it increasingly difficult to satisfy the real-time constraints. This thesis addresses real-time implementation by proposing parallel algorithms, that maintain considerable improvement over traditional algorithms, especially for systems with larger number of antenna array elements. Singular value decomposition (SVD) and polynomial rooting are two computationally complex steps and act as the bottleneck to achieving real-time performance. The proposed algorithms are suitable for implementation on field programmable gated arrays (FPGAs), single instruction multiple data (SIMD) hardware or application specific integrated chips (ASICs), which offer large number of processing elements that can be exploited for parallel processing. The designs proposed in this thesis are modular, easily expandable and easy to implement. Firstly, this thesis proposes a fast converging SVD algorithm. The proposed method reduces the number of iterations it takes to converge to correct singular values, thus achieving closer to real-time performance. A general algorithm and a modular system design are provided making it easy for designers to replicate and extend the design to larger matrix sizes. Moreover, the method is highly parallel, which can be exploited in various hardware platforms mentioned earlier. A fixed point implementation of proposed SVD algorithm is presented. The FPGA design is pipelined to the maximum extent to increase the maximum achievable frequency of operation. The system was developed with the objective of achieving high throughput. Various modern cores available in FPGAs were used to maximize the performance and details of these modules are presented in detail. Finally, a parallel polynomial rooting technique based on Newton’s method applicable exclusively to root-MUSIC polynomials is proposed. Unique characteristics of root-MUSIC polynomial’s complex dynamics were exploited to derive this polynomial rooting method. The technique exhibits parallelism and converges to the desired root within fixed number of iterations, making this suitable for polynomial rooting of large degree polynomials. We believe this is the first time that complex dynamics of root-MUSIC polynomial were analyzed to propose an algorithm. In all, the thesis addresses two major bottlenecks in a direction of arrival estimation system, by providing simple, high throughput, parallel algorithms

CiteSeerX

Michigan Technological University

Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

Author: Cavallaro Joseph R.
Dick Chris
Studer Christoph
Wang Guohui
Wu Michael
Yin Bei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

Parallel Searching-Based Sphere Detector for MIMO Downlink OFDM Systems

Author: Cavallaro Joseph R.
Kim Kyeong Jin
Radosavljevic Predrag
Shen Hao
Publication venue: IEEE
Publication date: 01/06/2012
Field of study

In this paper, implementation of a detector with parallel partial candidate-search algorithm is described. Two fully independent partial candidate search processes are simultaneously employed for two groups of transmit antennas based on QR decomposition (QRD) and QL decomposition (QLD) of a multiple-input multiple-output (MIMO) channel matrix. By using separate simultaneous candidate searching processes, the proposed implementation of QRD-QLD searching-based sphere detector provides a smaller latency and a lower computational complexity than the original QRD-M detector for similar error-rate performance in wireless communications systems employing four transmit and four receive antennas with 16-QAM or 64-QAM constellation size. It is shown that in coded MIMO orthogonal frequency division multiplexing (MIMO OFDM) systems, the detection latency and computational complexity of a receiver can be substantially reduced by using the proposed QRD-QLD detector implementation. The QRD-QLD-based sphere detector is also implemented using Field Programmable Gate Array (FPGA) and application specific integrated circuit (ASIC), and its hardware design complexity is compared with that of other sphere detectors reported in the literature.Nokia Renesas MobileTexas InstrumentsXilinxNational Science Foundatio

Crossref

DSpace at Rice University

SUPERVISED QR ALGORITHM TO ESTIMATE THE FINANCIAL PRODUCT DATA PROCESS TIME ANALYSIS BASED ON ONLINE PURCHASE DETAILS

Author: Dhananachezhiyan R
Publication venue: JConsort
Publication date: 28/10/2019
Field of study

TheSupervised QR Algorithm is thus used to estimate the financial production data recipe and time fraction based on the online purchase details. The trading algorithm method used before has a very low rating, and the recipe does not clearly explain the financial and time-related processes. For that, are going to use the QR Method in this area. The method of doing this is very helpful for easy analysis of financial and Times related information analysis. One of the most important in an industry is the financial sector.Moreover, analysis is very important to find the problems in them. Analysis can correct various errors in practice. It shows us information on how to raise funds and improve funding in online purchases. Modern variations in his generating and increasing data and creating a parallel account make many changes simultaneously. These are very helpful in adjusting the financial position of the industry to suit the financial situation

Scholar

Real-time implementation of fast discriminative scale space tracking algorithm

Author: Ahmed Ashfaq
Awais Muhammad
Martina Maurizio
Masera Guido
Walid Walid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

ASIP Design and Prototyping for Wireless Communication Applications

Author: Amer Baghdadi
Atif Raza Jafri
Michel Jezequel
Publication venue: 'IntechOpen'
Publication date: 01/01/2011
Field of study

International audienc

IntechOpen

HAL-Université de Bretagne Occidentale

HAL Descartes

Hal-Diderot