440 research outputs found

    Mimo Systems Low complexity SVD Implementation Analysis

    Full text link
    This paper analyses the implementation of the singular value decomposition (SVD) using approximation to the exact computation for MIMO systems in the case of modulation-mode and power assignment set-up. The study developed in the paper focuses on the use of low complexity algorithm with low computational load oriented to the use of devices with limited resources as FPGA, highlighting some of the advantages and drawbacks against more sophisticated devices. The implementation of the SVD is analyzed through the algorithms that efficiently perform the required computations, seeking for computationally efficient solutions that provide parallelism and low complexity. The CORDIC algorithm seems to be a good candidate for this task since it can efficiently compute the singular value decomposition. It is shown that this algorithm provides an efficient tool for SVD computation with appropriate accuracy and the computational complexity obtained and the required resources make it feasible to be implemented on an FPGA device. System performance degradation is analyzed compared with conventional and exact method for SVD obtaining some key conclusions

    A hierarchically blocked Jacobi SVD algorithm for single and multiple graphics processing units

    Full text link
    We present a hierarchically blocked one-sided Jacobi algorithm for the singular value decomposition (SVD), targeting both single and multiple graphics processing units (GPUs). The blocking structure reflects the levels of GPU's memory hierarchy. The algorithm may outperform MAGMA's dgesvd, while retaining high relative accuracy. To this end, we developed a family of parallel pivot strategies on GPU's shared address space, but applicable also to inter-GPU communication. Unlike common hybrid approaches, our algorithm in a single GPU setting needs a CPU for the controlling purposes only, while utilizing GPU's resources to the fullest extent permitted by the hardware. When required by the problem size, the algorithm, in principle, scales to an arbitrary number of GPU nodes. The scalability is demonstrated by more than twofold speedup for sufficiently large matrices on a Tesla S2050 system with four GPUs vs. a single Fermi card.Comment: Accepted for publication in SIAM Journal on Scientific Computin

    Singular value decomposition on SIMD hypercube and shuffle-exchange computers

    Get PDF
    AbstractThis paper reports several parallel singular value decomposition (SVD) algorithms on the hypercube and shuffle-exchange SIMD computers. Unlike previously published hypercube SVD algorithms which map a column pair of a matrix onto a processor, the algorithms presented in this paper map a matrix column pair onto a column of processors. In this way, a further reduction in time complexity is achieved. The paper also introduces the concept of two-dimensional shuffle-exchange networks, and corresponding SVD algorithms for one-dimensional and two-dimensional shuffle-exchange computers are developed

    ProperCAD II: A Run-Time Library for Portable, Parallel, Object-Oriented Programming with Applications to VLSI CAD

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research Corporation / grant 93-DP-10

    A tutorial on the characterisation and modelling of low layer functional splits for flexible radio access networks in 5G and beyond

    Get PDF
    The centralization of baseband (BB) functions in a radio access network (RAN) towards data processing centres is receiving increasing interest as it enables the exploitation of resource pooling and statistical multiplexing gains among multiple cells, facilitates the introduction of collaborative techniques for different functions (e.g., interference coordination), and more efficiently handles the complex requirements of advanced features of the fifth generation (5G) new radio (NR) physical layer, such as the use of massive multiple input multiple output (MIMO). However, deciding the functional split (i.e., which BB functions are kept close to the radio units and which BB functions are centralized) embraces a trade-off between the centralization benefits and the fronthaul costs for carrying data between distributed antennas and data processing centres. Substantial research efforts have been made in standardization fora, research projects and studies to resolve this trade-off, which becomes more complicated when the choice of functional splits is dynamically achieved depending on the current conditions in the RAN. This paper presents a comprehensive tutorial on the characterisation, modelling and assessment of functional splits in a flexible RAN to establish a solid basis for the future development of algorithmic solutions of dynamic functional split optimisation in 5G and beyond systems. First, the paper explores the functional split approaches considered by different industrial fora, analysing their equivalences and differences in terminology. Second, the paper presents a harmonized analysis of the different BB functions at the physical layer and associated algorithmic solutions presented in the literature, assessing both the computational complexity and the associated performance. Based on this analysis, the paper presents a model for assessing the computational requirements and fronthaul bandwidth requirements of different functional splits. Last, the model is used to derive illustrative results that identify the major trade-offs that arise when selecting a functional split and the key elements that impact the requirements.This work has been partially funded by Huawei Technologies. Work by X. Gelabert and B. Klaiqi is partially funded by the European Union's Horizon Europe research and innovation programme (HORIZON-MSCA-2021-DN-0) under the Marie Skłodowska-Curie grant agreement No 101073265. Work by J. Perez-Romero and O. Sallent is also partially funded by the Smart Networks and Services Joint Undertaking (SNS JU) under the European Union’s Horizon Europe research and innovation programme under Grant Agreements No. 101096034 (VERGE project) and No. 101097083 (BeGREEN project) and by the Spanish Ministry of Science and Innovation MCIN/AEI/10.13039/501100011033 under ARTIST project (ref. PID2020-115104RB-I00). This last project has also funded the work by D. Campoy.Peer ReviewedPostprint (author's final draft

    LAMANet: A real-time, machine learning-enhanced approximate message passing detector for massive MIMO

    Get PDF

    Parallel alogorithms for MIMD parallel computers

    Get PDF
    This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results
    • …
    corecore