Search CORE

4,015 research outputs found

Recommended from our members

Alternative methods for representing the inverse of linear programming basis matrices

Author: Mitra G
Tamiz M
Publication venue: Brunel University
Publication date: 01/01/1988
Field of study

Methods for representing the inverse of Linear Programming (LP) basis matrices are closely related to techniques for solving a system of sparse unsymmetric linear equations by direct methods. It is now well accepted that for these problems the static process of reordering the matrix in the lower block triangular (LBT) form constitutes the initial step. We introduce a combined static and dynamic factorisation of a basis matrix and derive its inverse which we call the partial elimination form of the inverse (PEFI). This factorization takes advantage of the LBT structure and produces a sparser representation of the inverse than the elimination form of the inverse (EFI). In this we make use of the original columns (of the constraint matrix) which are in the basis. To represent the factored inverse it is, however, necessary to introduce special data structures which are used in the forward and the backward transformations (the two major algorithmic steps) of the simplex method. These correspond to solving a system of equations and solving a system of equations with the transposed matrix respectively. In this paper we compare the nonzero build up of PEFI with that of EFI. We have also investigated alternative methods for updating the basis inverse in the PEFI representation. The results of our experimental investigation are presented in this pape

Brunel University Research Archive

Super-Positioning of Voltage Sources for Fast Assessment of Wide-Area Thévenin Equivalents

Author: Jóhannsson Hjörtur
Møller Jakob Glarbo
Østergaard Jacob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Crossref

Online Research Database In Technology

Using reconfigurable computing technology to accelerate matrix decomposition and applications

Author: Wang Xinying
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2016
Field of study

Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

Digital Repository @ Iowa State University (ISU)

Recommended from our members

A Parallel Direct Method for Finite Element Electromagnetic Computations Based on Domain Decomposition

Author: Moshfegh Javad
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/11/2019
Field of study

High performance parallel computing and direct (factorization-based) solution methods have been the two main trends in electromagnetic computations in recent years. When time-harmonic (frequency-domain) Maxwell\u27s equation are directly discretized with the Finite Element Method (FEM) or other Partial Differential Equation (PDE) methods, the resulting linear system of equations is sparse and indefinite, thus harder to efficiently factorize serially or in parallel than alternative methods e.g. integral equation solutions, that result in dense linear systems. State-of-the-art sparse matrix direct solvers such as MUMPS and PARDISO don\u27t scale favorably, have low parallel efficiency and high memory footprint. This work introduces a new class of sparse direct solvers based on domain decomposition method, termed Direct Domain Decomposition Method (D3M), which is reliable, memory efficient, and offers very good parallel scalability for arbitrary 3D FEM problems. Unlike recent trends in approximate/low-rank solvers, this method focuses on `numerically exact\u27 solution methods as they are more reliable for complex `real-life\u27 models. The proposed method leverages physical insights at every stage of the development through a new symmetric domain decomposition method (DDM) with one set of Lagrange multipliers. Applying a special regularization scheme at the interfaces, either artificial loss or gain is introduced to each domain to eliminate non-physical internal resonances. A block-wise recursive algorithm based on Takahashi relationship is proposed for the efficient computation of discrete Dirichlet-to-Neumann (DtN) map to reduce the volumetric problem from all domains into an auxiliary surfacial problem defined on the domain interfaces only. Numerical results show up to 50% run-time saving in DtN map computation using the proposed block-wise recursive algorithm compared to alternative approaches. The auxiliary unknowns on the domain interfaces form a considerably (approximately an order of magnitude) smaller block-wise sparse matrix, which is efficiently factorized using a customized block LDL

^T

factorization with restricted pivoting to ensure stability. The parallelization of the proposed D3M is realized based on Directed Acyclic Graph (DAG). Recent advances in parallel dense direct solvers, have shifted toward parallel implementation that rely on DAG scheduling to achieve highly efficient asynchronous parallel execution. However, adaptation of such schemes to sparse matrices is harder and often impractical. In D3M, computation of each domain\u27s discrete DtN map ``embarrassingly parallel\u27\u27, whereas the customized block LDLT is suitable for a block directed acyclic graph (B-DAG) task scheduling, similar to that used in dense matrix parallel direct solvers. In this approach, computations are represented as a sequence of small tasks that operate on domains of DDM or dense matrix blocks of the reduced matrix. These tasks can be statically scheduled for parallel execution using their DAG dependencies and weights that depend on estimates of computation and communication costs. Comparisons with state-of-the-art exact direct solvers on electrically large problems suggest up to 20% better parallel efficiency, 30% - 3X less memory and slightly faster in runtime, while maintaining the same accuracy

ScholarWorks@UMass Amherst

HPC compact quasi-Newton algorithm for interface problems

Author: Borrell Pol Ricard
Houzeaux Guillaume
Santiago Alfonso
Vázquez Mariano
Zavala Aké Miguel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

In this work we present a robust interface coupling algorithm called Compact Interface quasi-Newton (CIQN). It is designed for computationally intensive applications using an MPI multi-code partitioned scheme. The algorithm allows to reuse information from previous time steps, feature that has been previously proposed to accelerate convergence. Through algebraic manipulation, an efficient usage of the computational resources is achieved by: avoiding construction of dense matrices and reduce every multiplication to a matrix-vector product and reusing the computationally expensive loops. This leads to a compact version of the original quasi-Newton algorithm. Altogether with an efficient communication, in this paper we show an efficient scalability up to 4800 cores. Three examples with qualitatively different dynamics are shown to prove that the algorithm can efficiently deal with added mass instability and two-field coupled problems. We also show how reusing histories and filtering does not necessarily makes a more robust scheme and, finally, we prove the necessity of this HPC version of the algorithm. The novelty of this article lies in the HPC focused implementation of the algorithm, detailing how to fuse and combine the composing blocks to obtain an scalable MPI implementation. Such an implementation is mandatory in large scale cases, for which the contact surface cannot be stored in a single computational node, or the number of contact nodes is not negligible compared with the size of the domain. \c{opyright} Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Comment: 33 pages: 23 manuscript, 10 appendix. 16 figures: 4 manuscript, 12 appendix. 10 Tables: 3 manuscript, 7 appendi

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Annals of Economic and Social Measurement, Volume 6, number 5

Author: Arne Drud
Publication venue
Publication date
Field of study

Research Papers in Economics