82 research outputs found

    Computation of restricted maximum likelihood estimates of variance components

    Get PDF
    The method preferred by animal breeders for the estimation of variance components is restricted maximum likelihood (REML). Various iterative algorithms have been proposed for computing REML estimates. Five different computational strategies for implementing such an algorithm were compared in terms of flops (floating-point operations). These strategies were based respectively on the LDL\u27 decomposition, the W transformation, the SWEEP method, tridiagonalization and diagonalization of the coefficient matrix of the mixed-model equations;The computational requirements of the orthogonal transformations employed in tridiagonalization and diagonalization were found to be rather extensive. However, these transformations are performed prior to the initiation of the iterative estimation process and need not be repeated during the remainder of the process. Subsequent to either diagonalization or tridiagonalization, the flops required per iteration are very minimal. Thus, for most applications of mixed-effects linear models with a single set of random effects, the use of an orthogonal transformation prior to the initiation of the iterative process is recommended. For most animal breeding applications, tridiagonalization will generally be more efficient than diagonalization;In most animal breeding applications, the coefficient matrix of the mixed-model equations is extremely sparse and of very large order. The use of sparse-matrix techniques for the numerical evaluation of the log-likelihood function and its first- and second-order partial derivatives was investigated in the case of the simple sire and animal models. Instead of applying these techniques directly to the coefficient matrix of the mixed-model equations to obtain the Cholesky factor, they were used to obtain the Cholesky factor indirectly by carrying out a QR decomposition of an augmented model matrix;The feasibility of the computational method for the simple sire model was investigated by carrying out the most computationally intensive part of this method (which is the part consisting of the QR decomposition) for an animal breeding data set comprising 180,994 records and 1,264 sires. The total CPU time required for this part (using an NAS AS/9160 computer) was approximately 75,000 seconds

    Tridiagonalization of an arbitrary square matrix

    Get PDF

    Quantum Computational Simulations for Condensed Matter Systems

    Get PDF
    In condensed matter physics, and especially in the study of strongly correlated electron systems, numerical simulation techniques are crucial to determine the properties of the system including interesting phases of matter that arise from electron-electron interactions. Many of these interesting phases of matter, including but not limited to Mott-insulating materials and possibly high-temperature superconducting systems, can be modeled by the Hubbard model. Although it is one of the simplest models to include electron-electron interactions, it cannot be solved analytically in more than one dimension and thus numerical techniques must be employed. Although there have been great strides in classical numerical simulation techniques for quantum many- body systems, all currently known simulation methods suffer from exponential resource scaling in certain parameter regimes. Quantum computing techniques promise to alleviate these exponential scaling issues to allow simulations of larger and more complex systems. In this dissertation, I will present methods and results for simulations that leverage quantum computing for simulations of the Hubbard model. These simulations include both direct simulation of the Hubbard model along with results that solve the Hubbard model using dynamical mean-field theory (DMFT). Dynamical mean-field theory is a self-consistent mapping from the Hubbard to the Anderson impurity model, which reproduces the physics of the Hubbard model directly in the thermodynamic limit. In terms of utilization of quantum computing techniques, here I present both results of a simulation run on real quantum hardware along with algorithms developed for future quantum hardware. Specifically, I run a small DMFT simulation which utilizes both classical computing techniques and quantum computation. I also develop multiple algorithmic techniques for preparing quantum many-body states on a quantum computer and a quantum algorithm to calculate a generic response function of the system. Finally, I will give an outlook on challenges and future opportunities for using quantum computation to simulate quantum many-body systems

    Fast Algorithm Development for SVD: Applications in Pattern Matching and Fault Diagnosis

    Get PDF
    The project aims for fast detection and diagnosis of faults occurring in process plants by designing a low-cost FPGA module for the computation. Fast detection and diagnosis when the process is still operating in a controllable region helps avoiding the further advancement of the fault and reduce the productivity loss. Model-based methods are not popular in the domain of process control as obtaining an accurate model is expensive and requires an expertise. Data-driven methods like Principal Component Analysis(PCA) is a quite popular diagnostic method for process plants as they do not require any model. PCA is widely used tool for dimensionality reduction and thus reducing the computational e�ort. The trends are captured in prinicpal components as it is di�cult to have a same amount of disturbance as simulated in historical database. The historical database has multiple instances of various kinds of faults and disturbances along with normal operation. A moving window approach has been employed to detect similar instances in the historical database based on Standard PCA similarity factor. The measurements of variables of interest over a certain period of time forms the snapshot dataset, S. At each instant, a window of same size as that of snapshot dataset is picked from the historical database forms the historical window, H. The two datasets are then compared using similarity factors like Standard PCA similarity factor which signi�es the angular di�erence between the principal components of two datasets. Since many of the operating conditions are quite similar to each other and signi�cant number of mis-classi�cations have been observed, a candidate pool which orders the historical data windows on the values of similarity factor is formed. Based on the most detected operation among the top-most windows, the operating personnel takes necessary action. Tennessee Eastman Challenge process has been chosen as an initial case study for evaluating the performance. The measurements are sampled for every one minute and the fault having the smallest maximum duration is 8 hours. Hence the snapshot window size, m has been chosen to be consisting of 500 samples i.e 8.33 hours of most recent data of all the 52 variables. Ideally, the moving window should replace the oldest sample with a new one. Then it would take approximately the same number of comparisons as that of size of historical database. The size of the historical database is 4.32 million measurements(past 8years data) for each of the 52 variables. With software simulation on Matlab, this takes around 80-100 minutes to sweep through the whole 4.32 million historical database. Since most of the computation is spent in �nding principal components of the two datasets using SVD, a hardware design has to be incorporated to accelerate the pattern matching approach. The thesis is organized as follows: Chapter 1 describes the moving window approach, various similarity factors and metrics used for pattern matching. The previous work proposed by Ashish Singhal is based on skipping few samples for reducing the computational e�ort and also employs windows as large as 5761 which is four days of snapshot. Instead, a new method which skips the samples when the similarity factor is quite low has been proposed. A simpli�ed form of the Standard PCA similarity has been proposed without any trade-o� in accuracy. Pre-computation of historical database can also be done as the data is available aprior, but this requires a large memory requirement as most of the time is spent in read/write operations. The large memory requirement is due to the fact that every sample will give rise to 52�35 matrix assuming the top-35 PC's are sufficient enough to capture the variance of the dataset. Chapter 2 describes various popular algorithms for SVD. Algorithms apart from Jacobi methods like Golub-Kahan, Divide and conquer SVD algorithms are brie y discussed. While bi-diagonal methods are very accurate they suffer from large latency and computationally intensive. On the other hand, Jacobi methods are computationally inexpensive and parallelizable, thus reducing the latency. We also evaluted the performance of the proposed hybrid Golub-Kahan Jacobi algorithm to our application. Chapter 3 describes the basic building block CORDIC which is used for performing rotations required for Jacobi methods or for n-D householder re ections of Golub-Kahan SVD. CORIDC is widely employed in hardware design for computing trigonometric, exponential or logarithmic functions as it makes use of simple shift and add/subtract operations. Two modes of CORDIC namely Rotation mode and Vectoring mode are discussed which are used in the derivation of Two-sided Jacobi SVD. Chapter 4 describes the Jacobi methods of SVD which are quite popular in hardware implementation as they are quite amenable to parallel computation. Two variants of Jacobi methods namely One-sided and Two-sided Jacobi methods are brie y discussed. Two-sided Jacobi making making use of CORDIC has has been derived. The systolic array implementation which is quite popular in hardware implementation for the past three decades has been discussed. Chapter 5 deals with the Hardware implementation of Pattern matching and reports the literature survey of various architectures developed for computing SVD. Xilinx ZC7020 has been chosen as target device for FPGA implementation as it is inexpensive device with many built-in peripherals. The latency reports with both Vivado HLS and Vivado SDSoC are also reported for the application of interest. Evaluation of other case studies and other datadriven methods similar to PCA like Correspondence Analysis(CA) and Independent Component Analysis(ICA), development of efficient hybrid method for computing SVD in hardware and highly discriminating similarity factor, extending CORDIC to n-dimensions for householder re ections have been considered for future research

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing

    Get PDF
    Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered

    Algorithm and architecture for simultaneous diagonalization of matrices applied to subspace-based speech enhancement

    Get PDF
    This thesis presents algorithm and architecture for simultaneous diagonalization of matrices. As an example, a subspace-based speech enhancement problem is considered, where in the covariance matrices of the speech and noise are diagonalized simultaneously. In order to compare the system performance of the proposed algorithm, objective measurements of speech enhancement is shown in terms of the signal to noise ratio and mean bark spectral distortion at various noise levels. In addition, an innovative subband analysis technique for subspace-based time-domain constrained speech enhancement technique is proposed. The proposed technique analyses the signal in its subbands to build accurate estimates of the covariance matrices of speech and noise, exploiting the inherent low varying characteristics of speech and noise signals in narrow bands. The subband approach also decreases the computation time by reducing the order of the matrices to be simultaneously diagonalized. Simulation results indicate that the proposed technique performs well under extreme low signal-to-noise-ratio conditions. Further, an architecture is proposed to implement the simultaneous diagonalization scheme. The architecture is implemented on an FPGA primarily to compare the performance measures on hardware and the feasibility of the speech enhancement algorithm in terms of resource utilization, throughput, etc. A Xilinx FPGA is targeted for implementation. FPGA resource utilization re-enforces on the practicability of the design. Also a projection of the design feasibility for an ASIC implementation in terms of transistor count only is include
    corecore