Search CORE

177 research outputs found

Look-ahead in the two-sided reduction to compact band forms for symmetric eigenvalue problems and the SVD

Author: Catalán Sandra
Herrero José R.
Quintana-Orti Enrique S.
Rodríguez Sánchez Rafael
Tomás Domínguez Andrés Enrique
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We address the reduction to compact band forms, via unitary similarity transformations, for the solution of symmetric eigenvalue problems and the computation of the singular value decomposition (SVD). Concretely, in the first case, we revisit the reduction to symmetric band form, while, for the second case, we propose a similar alternative, which transforms the original matrix to (unsymmetric) band form, replacing the conventional reduction method that produces a triangular– band output. In both cases, we describe algorithmic variants of the standard Level 3 Basic Linear Algebra Subroutines (BLAS)-based procedures, enhanced with lookahead, to overcome the performance bottleneck imposed by the panel factorization. Furthermore, our solutions employ an algorithmic block size that differs from the target bandwidth, illustrating the important performance benefits of this decision. Finally, we show that our alternative compact band form for the SVD is key to introduce an effective look-ahead strategy into the corresponding reduction procedure

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Repositori Institucional de la Universitat Jaume I

A Flexible Numerical Framework for Engineering---A Response Surface Modelling Application

Author: Aldinucci Marco
d&apos
Lemeire J.
Viviani Paolo
Vucinic D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Hybrid CPU-GPU implementation of the transformed spatial domain channel estimation algorithm for mmWave MIMO systems

Author: Aviles Marin Pablo Manuel
Belloch Jose A.
Botella Mascarell Carmen
Lindoso Muñoz Almudena
Lloria Diego
Roger Varea Sandra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2023
Field of study

Hybrid platforms combining multicore central processing units (CPU) with manycore hardware accelerators such as graphic processing units (GPU) can be smartly exploited to provide efcient parallel implementations of wireless communication algorithms for Fifth Generation (5G) and beyond systems. Massive multiple-input multiple-output (MIMO) systems are a key element of the 5G standard, involving several tens or hundreds of antenna elements for communication. Such a high number of antennas has a direct impact on the computational complexity of some MIMO signal processing algorithms. In this work, we focus on the channel estimation stage. In particular, we develop a parallel implementation of a recently proposed MIMO channel estimation algorithm. Its performance in terms of execution time is evaluated both in a multicore CPU and in a GPU. The results show that some computation blocks of the algorithm are more suitable for multicore implementation, whereas other parts are more efciently implemented in the GPU, indicating that a hybrid CPU-GPU implementation would achieve the best performance in practical applications based on the tested platform

Repositori d'Objectes Digitals per a l'Ensenyament la Recerca i la Cultura

Performance analysis of a millimeter wave MIMO channel estimation method in an embedded multi-core processor

Author: Aviles Delgado Pablo Miguel
Belloch Rodríguez José Antonio
Cobos Maximo
Lindoso Muñoz Almudena
Lloria Diego
Roger Sandra
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/08/2022
Field of study

The emerging Multi-Processor System-on-Chip (MPSoC) technology, which combines heterogeneous computing with the high performance of field programmable gate arrays (FPGA), is a promising platform for a large number of applications, including wireless communications and vehicular technology. In this specific application context, when multiple-input multiple-output (MIMO) scenarios are considered, the system usually has to manage a large number of communication links among sensors and antennas involving different vehicles and users. Millimeter wave (mmWave) communications are one of the key technology enablers toward achieving high data rates in beyond 5G systems (B5G). Communication at these frequency bands usually involves the use of large antenna arrays, often requiring high computational resources. One of the candidate platforms able to manage a huge number of communications is the Xilinx Zynq UltraScale+ EG Heterogeneous MPSoC, which is composed of a dual-core Cortex-R5, a quad-core ARM Cortex-A53, a graphics processing unit (GPU) and a high-end FPGA. This work analyzes the computational performance that requires a recent mmWave MIMO channel estimation algorithm in a platform of this kind. As a first approach, we will focus our work on the performance that can be achieved via the quad-core ARM Cortex-A53. To this end, we will use the libraries for numerical algebra (BLAS and LAPACK). The results show that our reference implementation is able to manage a large MIMO communication system with 256 antennas without exhausting platform resources.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. Thanks to Grant PID2020-113785RB-100 funded by MCIN/AEI/1013039/ 501100011033 and the Ramón y Cajal Grant RYC-2017-22101. The work has been also supported by the Spanish Ministry of Science and Innovation under Grants RTI2018-097045-B-C21, PID2019-106455GB-C21 and PID2020-113656RB-C21, as well as the Regional Government of Madrid throughout the projects MIMACUHSPACE-CM-UC3M (2022/00024/001) and PEJD-2019-PRE/TIC-16327

Universidad Carlos III de Madrid e-Archivo

randUTV: A blocked randomized algorithm for computing a rank-revealing UTV factorization

Author: Heavner Nathan
Martinsson Per-Gunnar
Quintana-Orti Gregorio
Publication venue
Publication date: 02/03/2017
Field of study

This manuscript describes the randomized algorithm randUTV for computing a so called UTV factorization efficiently. Given a matrix

A

, the algorithm computes a factorization

A = UTV^{*}

, where

U

and

V

have orthonormal columns, and

T

is triangular (either upper or lower, whichever is preferred). The algorithm randUTV is developed primarily to be a fast and easily parallelized alternative to algorithms for computing the Singular Value Decomposition (SVD). randUTV provides accuracy very close to that of the SVD for problems such as low-rank approximation, solving ill-conditioned linear systems, determining bases for various subspaces associated with the matrix, etc. Moreover, randUTV produces highly accurate approximations to the singular values of

A

. Unlike the SVD, the randomized algorithm proposed builds a UTV factorization in an incremental, single-stage, and non-iterative way, making it possible to halt the factorization process once a specified tolerance has been met. Numerical experiments comparing the accuracy and speed of randUTV to the SVD are presented. These experiments demonstrate that in comparison to column pivoted QR, which is another factorization that is often used as a relatively economic alternative to the SVD, randUTV compares favorably in terms of speed while providing far higher accuracy

arXiv.org e-Print Archive

Oxford University Research Archive

Solving $k$ -means on High-dimensional Big Data

Author: AK Jain
H Steinhaus
J Stallmann
JL Bentley
K Jain
MR Ackermann
MW Mahoney
N Halko
P Drineas
PK Agarwal
T Kanungo
T Zhang
X Wu
Publication venue
Publication date: 01/01/2015
Field of study

In recent years, there have been major efforts to develop data stream algorithms that process inputs in one pass over the data with little memory requirement. For the

k

-means problem, this has led to the development of several

(1+\varepsilon)

-approximations (under the assumption that

k

is a constant), but also to the design of algorithms that are extremely fast in practice and compute solutions of high accuracy. However, when not only the length of the stream is high but also the dimensionality of the input points, then current methods reach their limits. We propose two algorithms, piecy and piecy-mr that are based on the recently developed data stream algorithm BICO that can process high dimensional data in one pass and output a solution of high quality. While piecy is suited for high dimensional data with a medium number of points, piecy-mr is meant for high dimensional data that comes in a very long stream. We provide an extensive experimental study to evaluate piecy and piecy-mr that shows the strength of the new algorithms.Comment: 23 pages, 9 figures, published at the 14th International Symposium on Experimental Algorithms - SEA 201

arXiv.org e-Print Archive

computer science publication server

Crossref

Kölner UniversitätsPublikationsServer