156 research outputs found

    Parallel minimum norm solution of sparse block diagonal column overlapped underdetermined systems

    Get PDF
    Underdetermined systems of equations in which the minimum norm solution needs to be computed arise in many applications, such as geophysics, signal processing, and biomedical engineering. In this article, we introduce a new parallel algorithm for obtaining the minimum 2-norm solution of an underdetermined system of equations. The proposed algorithm is based on the Balance scheme, which was originally developed for the parallel solution of banded linear systems. The proposed scheme assumes a generalized banded form where the coefficient matrix has column overlapped block structure in which the blocks could be dense or sparse. In this article, we implement the more general sparse case. The blocks can be handled independently by any existing sequential or parallel QR factorization library. A smaller reduced system is formed and solved before obtaining the minimum norm solution of the original system in parallel. We experimentally compare and confirm the error bound of the proposed method against the QR factorization based techniques by using true single-precision arithmetic. We implement the proposed algorithm by using the message passing paradigm. We demonstrate numerical effectiveness as well as parallel scalability of the proposed algorithm on both shared and distributed memory architectures for solving various types of problems. © 2017 ACM

    A Novel Partitioning Method for Accelerating the Block Cimmino Algorithm

    Get PDF
    We propose a novel block-row partitioning method in order to improve the convergence rate of the block Cimmino algorithm for solving general sparse linear systems of equations. The convergence rate of the block Cimmino algorithm depends on the orthogonality among the block rows obtained by the partitioning method. The proposed method takes numerical orthogonality among block rows into account by proposing a row inner-product graph model of the coefficient matrix. In the graph partitioning formulation defined on this graph model, the partitioning objective of minimizing the cutsize directly corresponds to minimizing the sum of inter-block inner products between block rows thus leading to an improvement in the eigenvalue spectrum of the iteration matrix. This in turn leads to a significant reduction in the number of iterations required for convergence. Extensive experiments conducted on a large set of matrices confirm the validity of the proposed method against a state-of-the-art method

    Tomographic reconstruction algorithms using optoelectronic devices

    Get PDF
    During the last two decades, iterative computerized tomography (CT) algorithms, such as ART (Algebraic Reconstruction Technique) and SIRT (Simultaneous Iterative Reconstruction Technique), have been applied to the solution of overdetermined and underdetermined systems. These algorithms arrive at the least squares solution of normal equations. In theory, such algorithms converge to the minimum-norm solution when a system is underdetermined if there are no computational errors and the initial vector is chosen properly. In practice, computational errors may lead to failure to converge to a unique solution.;The dissertation introduces a method called the projection iterative reconstruction technique (PIRT) which differs from the other reconstruction algorithms used for solving underdetermined systems. Even though the differences between the method outlined in this dissertation and the algorithms proposed earlier are subtle, the proposed scheme guarantees convergence to a unique minimum-norm solution. Several acceleration techniques are discussed in the dissertation. Furthermore, the iterative algorithm can also be generalized and employed to solve other large and sparse linear systems

    On a method for calculating generalized normal solutions of underdetermined linear systems

    Get PDF
    В статье представлен новый метод вычисления обобщённых нормальных решений недоопределённых систем линейных алгебраических уравнений на основе специальных расширенных систем. Преимуществом данного метода является возможность решения очень плохо обусловленных (возможно разреженных) недоопределённых линейных систем большой размерности с использованием современных вариантов метода итерационного уточнения на основе метода обобщённых минимальных невязок (GMRES-IT). Представлены результаты применения рассматриваемого алгоритма для решения задачи балансировки химических уравнений (баланс масс).Жданов, А.И. Об одном методе вычисления обобщённых нормальных решений недоопределённых линейных систем / А.И. Жданов, Ю.В. Сидоров// Компьютерная оптика. – 2020. – Т. 44, № 1. – С. 133-136. – DOI: 10.18287/2412-6179-CO-607

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    REITERATIVE MINIMUM MEAN SQUARE ERROR ESTIMATOR FOR DIRECTION OF ARRIVAL ESTIMATION AND BIOMEDICAL FUNCTIONAL BRAIN IMAGING

    Get PDF
    Two novel approaches are developed for direction-of-arrival (DOA) estimation and functional brain imaging estimation, which are denoted as ReIterative Super-Resolution (RISR) and Source AFFine Image REconstruction (SAFFIRE), respectively. Both recursive approaches are based on a minimum mean-square error (MMSE) framework. The RISR estimator recursively determines an optimal filter bank by updating an estimate of the spatial power distribution at each successive stage. Unlike previous non-parametric covariance-based approaches, which require numerous time snapshots of data, RISR is a parametric approach thus enabling operation on as few as one time snapshot, thereby yielding very high temporal resolution and robustness to the deleterious effects of temporal correlation. RISR has been found to resolve distinct spatial sources several times better than that afforded by the nominal array resolution even under conditions of temporally correlated sources and spatially colored noise. The SAFFIRE algorithm localizes the underlying neural activity in the brain based on the response of a patient under sensory stimuli, such as an auditory tone. The estimator processes electroencephalography (EEG) or magnetoencephalography (MEG) data simulated for sensors outside the patient's head in a recursive manner converging closer to the true solution at each consecutive stage. The algorithm requires a minimal number of time samples to localize active neural sources, thereby enabling the observation of the neural activity as it progresses over time. SAFFIRE has been applied to simulated MEG data and has shown to achieve unprecedented spatial and temporal resolution. The estimation approach has also demonstrated the capability to precisely isolate the primary and secondary auditory cortex responses, a challenging problem in the brain MEG imaging community

    Heterogeneous multicore systems for signal processing

    Get PDF
    This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included
    corecore