Search CORE

2,282 research outputs found

A pipeline structure for the block QR update in digital signal processing

Author: A Buttari
Antonio M. Vidal
BC Gunter
C Ramiro
D Rio del
Fran J. Alventosa
G Quintana-Ortí
GH Golub
J Benesty
JA Belloch
JA Belloch
Manuel F. Dolz
Pedro Alonso-Jordá
Y Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

[EN] There exist problems in the field of digital signal processing, such as filtering of acoustic signals that require processing a large amount of data in real time. The beamforming algorithm, for instance, is a process that can be modeled by a rectangular matrix built on the input signals of an acoustic system and, thus, changes in real time. To obtain the output signals, it is required to compute its QR factorization. In this paper, we propose to organize the concurrent computational resources of a given multicore computer in a pipeline structure to perform this factorization as fast as possible. The pipeline has been implemented using both the application programming interface OpenMP and GrPPI, a library interface to design parallel applications based on parallel patterns. We tackle not only the performance challenge but also the programmability of our idea using parallel programming frameworks.This work was supported by the Spanish Ministry of Economy and Competitiveness under MINECO and FEDER projects TIN2014-53495-R and TEC2015-67387-C4-1-R.Dolz, MF.; Alventosa, FJ.; Alonso-Jordá, P.; Vidal Maciá, AM. (2019). A pipeline structure for the block QR update in digital signal processing. The Journal of Supercomputing. 75(3):1470-1482. https://doi.org/10.1007/s11227-018-2666-1S14701482753Huang Y, Benesty J, Chen J (2006) Acoustic MIMO signal processing (signals and communication technology). Springer, BerlinRamiro C, Vidal AM, González A (2015) MIMOPack: a high performance computing library for MIMO communication systems. J Supercomput 71:751–760Alventosa FJ, Alonso P, Piñero G, Vidal AM (2016) Implementation of the Beamformer algorithm for the NVIDIA Jetson. In: Actas de la Conferencia, Granada, Spain, pp 201–211. ISBN 978-3-319-49955-0Alventosa FJ, Alonso P, Vidal AM, Piñero G, Quintana-Ortí ES (2018) Fast block QR update in digital signal processing. J Supercomput. https://doi.org/10.1007/s11227-018-2298-5del Rio D, Dolz MF, Fernández J, García JD (2017) A generic parallel pattern interface for stream and data processing. Concurr Comput Pract Exp 29(24):e4175Benesty J, Chen J, Huang Y, Dmochowski J (2007) On microphone-array Beamforming from a MIMO acoustic signal processing perspective. IEEE Trans Audio Speech Lang Process 15(3):1053–1065Lorente J, Piñero G, Vidal AM, Belloch JA, González A (2011) Parallel implementations of Beamforming design and filtering for microphone array applications. In: 19th European Signal Processing Conference (EUSIPCO), Barcelona, Spain, pp 501–505Belloch JA, Ferrer M, González A, Martínez-Zaldívar FJ, Vidal AM (2013) Headphone-based virtual spatialization of sound with a GPU accelerator. J Audio Eng Soc 61:546–561Belloch JA, González A, Martínez-Zaldívar FJ, Vidal AM (2011) Real-time massive convolution for audio applications on GPU. J Supercomput 58(3):449–457Golub GH, Van Loan CF (2013) Matrix computations. Johns Hopkins studies in the mathematical sciences. Johns Hopkins University Press, BaltimoreGunter BC, van de Geijn RA (2005) Parallel out-of-core computation and updating the QR factorization. ACM Trans Math Softw 31(1):60–78Buttari A, Langou J, Kurzak J, Dongarra J (2009) A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput 35(1):38–53Dolz MF, Alventosa FJ, Alonso-Jordá P, Vidal AM (2018) A pipeline for the QR update in digital signal processing. In: Proceedings of the 18th International Conference on Computational and Mathematical Methods in Science and Engineering (CMMSE 2018), Rota, Cádiz, Spain, pp 1–5Quintana-Ortí G, Quintana-Ortí ES, Van De Geijn RA, Van Zee FG, Chan E (2009) Programming matrix algorithms-by-blocks for thread-level parallelism. ACM Trans Math Softw 36(3):14:1–14:2

Crossref

Repositori Institucional de la Universitat Jaume I

RiuNet

A Pipeline for the QR Update in Digital Signal Processing

Author: Alonso-Jordá Pedro
Alventosa Fran J.
Dolz Manuel F.
Vidal Maciá Antonio Manuel
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

[EN] The input and output signals of a digital signal processing system can often be represented by a rectangular matrix as it is the case of the beamformer algorithm, a very useful particular algorithm that allows extraction of the original input signal once it is cleaned from noise and room reverberation. We use a version of this algorithm in which the system matrix must be factorized to solve a least squares problem. The matrix changes periodically according to the input signal sampled; therefore, the factorization needs to be recalculated as fast as possible. In this paper, we propose to use parallelism through a pipeline pattern. With our pipeline, some partial computations are advanced so that the final time required to update the factorization is highly reducedThis work was supported by the Spanish Ministry of Economy and Competitiveness under MINECO and FEDER projects TIN2014-53495-R and TEC2015-67387-C4-1-R.Dolz, MF.; Alventosa, FJ.; Alonso-Jordá, P.; Vidal Maciá, AM. (2019). A Pipeline for the QR Update in Digital Signal Processing. Computational and Mathematical Methods. 1:1-13. https://doi.org/10.1002/cmm4.1022S113

RiuNet

A fast algorithm for LR-2 factorization of Toeplitz matrices

Author: Glentis George-Othon
Publication venue: Elsevier
Publication date: 01/01/1995
Field of study

In this paper a new order recursive algorithm for the efficient −1 factorization of Toeplitz matrices is described. The proposed algorithm can be seen as a fast modified Gram-Schmidt method which recursively computes the orthonormal columns i, i = 1,2, …,p, of , as well as the elements of R−1, of a Toeplitz matrix with dimensions L × p. The factor estimation requires 8Lp MADS (multiplications and divisions). Matrix −1 is subsequently estimated using 3p2 MADS. A faster algorithm, based on a mixed and −1 updating scheme, is also derived. It requires 7Lp + 3.5p2 MADS. The algorithm can be efficiently applied to batch least squares FIR filtering and system identification. When determination of the optimal filter is the desired task it can be utilized to compute the least squares filter in an order recursive way. The algorithm operates directly on the experimental data, overcoming the need for covariance estimates. An orthogonalized version of the proposed −1 algorithm is derived. Matlab code implementing the algorithm is also supplied

University of Twente Research Information

Real-time implementation of fast discriminative scale space tracking algorithm

Author: Ahmed Ashfaq
Awais Muhammad
Martina Maurizio
Masera Guido
Walid Walid
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

CORDICのハードウェア構成及び応用に関する研究

Author: NGUYEN THI HONG THU
Publication venue
Publication date: 01/11/2018
Field of study

電気通信大学201

Creative Repository of Electro-Communications

Efficient arithmetic for high speed DSP implementation on FPGAs

Author: Alexander Steven Wilson
Publication venue
Publication date: 01/01/2007
Field of study

The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems. The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive. The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation

Glasgow Theses Service

OpenGrey Repository