37 research outputs found

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Cumulative reports and publications through December 31, 1988

    Get PDF
    This document contains a complete list of ICASE Reports. Since ICASE Reports are intended to be preprints of articles that will appear in journals or conference proceedings, the published reference is included when it is available

    Algorithm-Based Fault Tolerance for Two-Sided Dense Matrix Factorizations

    Get PDF
    The mean time between failure (MTBF) of large supercomputers is decreasing, and future exascale computers are expected to have a MTBF of around 30 minutes. Therefore, it is urgent to prepare important algorithms for future machines with such a short MTBF. Eigenvalue problems (EVP) and singular value problems (SVP) are common in engineering and scientific research. Solving EVP and SVP numerically involves two-sided matrix factorizations: the Hessenberg reduction, the tridiagonal reduction, and the bidiagonal reduction. These three factorizations are computation intensive, and have long running times. They are prone to suffer from computer failures. We designed algorithm-based fault tolerant (ABFT) algorithms for the parallel Hessenberg reduction and the parallel tridiagonal reduction. The ABFT algorithms target fail-stop errors. These two fault tolerant algorithms use a combination of ABFT and diskless checkpointing. ABFT is used to protect frequently modified data . We carefully design the ABFT algorithm so the checksums are valid at the end of each iterative cycle. Diskless checkpointing is used for rarely modified data. These checkpoints are in the form of checksums, which are small in size, so the time and storage cost to store them in main memory is small. Also, there are intermediate results which need to be protected for a short time window. We store a copy of this data on the neighboring process in the process grid. We also designed algorithm-based fault tolerant algorithms for the CPU-GPU hybrid Hessenberg reduction algorithm and the CPU-GPU hybrid bidiagonal reduction algorithm. These two fault tolerant algorithms target silent errors. Our design employs both ABFT and diskless checkpointing to provide data redundancy. The low cost error detection uses two dot products and an equality test. The recovery protocol uses reverse computation to roll back the state of the matrix to a point where it is easy to locate and correct errors. We provided theoretical analysis and experimental verification on the correctness and efficiency of our fault tolerant algorithm design. We also provided mathematical proof on the numerical stability of the factorization results after fault recovery. Experimental results corroborate with the mathematical proof that the impact is mild

    Cumulative reports and publications through December 31, 1990

    Get PDF
    This document contains a complete list of ICASE reports. Since ICASE reports are intended to be preprints of articles that will appear in journals or conference proceedings, the published reference is included when it is available

    High-level FPGA accelerator design for structured-mesh-based numerical solvers

    Get PDF
    Field Programmable Gate Arrays (FPGAs) have become highly attractive as accelerators due to their low power consumption and re-programmability. However, a key limitation is the time and know-how required to program them. Even with high-level synthesis tools, they still require significant hand-tuned/low-level customizations and design space exploration to gain good performance. The need to program FPGAs using the dataflow programming model, much less well known and practised by the high-performance computing (HPC) community, is a major barrier for adoption for HPC. The underlying motivation of this work is to bridge this gap - attaining near-optimal performance vs the ease of programming. To this end, we target the important class of applications based on structured meshes, focusing on numerical algorithms based on explicit and implicit techniques. We leverage the main characteristics of the application class, its computation-communication pattern and the hardware features. For explicit schemes, characterized by stencil computations, we unify the state-of-the-art techniques such as vectorization and unrolling with a number of new high-gain optimizations such as creating perfect data reuse data-paths, batching and tiling. A key new feature is their applicability to multiple stencil loops enabling the development of real-world workloads. For implicit schemes, we re-evaluate the characteristics of the tridiagonal system solver algorithms for FPGAs and develop a new high throughput batched multi-dimensional tridiagonal system solver library with orders of magnitude better performance than the state-of-the-art. New analytic models are developed to support the solvers, elucidating and modelling the critical path of execution and parameterizing the design. This together with the optimal designs and new library lead to a unified design work-flow for synthesis on FPGAs. The new workflow is used to implement a range of applications, from simple single stencil designs, multiple stencil loops to solvers with real-world utility. They are synthesized on the currently dominant Xilinx and Intel FPGAs. Benchmarking indicate the FPGAs matching or outperforming the best GPU implementations, the current best traditional architecture device solution. Over 30% energy saving can also be observed. The performance model demonstrates over 85% accuracy. The thesis discusses the determinants for these applications to be amenable for FPGA implementation, providing insights into the feasibility and profitability of a design. Finally we propose initial steps in automating the workflow to be used through a DSL

    Research summary, January 1989 - June 1990

    Get PDF
    The Research Institute for Advanced Computer Science (RIACS) was established at NASA ARC in June of 1983. RIACS is privately operated by the Universities Space Research Association (USRA), a consortium of 62 universities with graduate programs in the aerospace sciences, under a Cooperative Agreement with NASA. RIACS serves as the representative of the USRA universities at ARC. This document reports our activities and accomplishments for the period 1 Jan. 1989 - 30 Jun. 1990. The following topics are covered: learning systems, networked systems, and parallel systems

    Continuous-time Algorithms and Analog Integrated Circuits for Solving Partial Differential Equations

    Get PDF
    Analog computing (AC) was the predominant form of computing up to the end of World War II. The invention of digital computers (DCs) followed by developments in transistors and thereafter integrated circuits (IC), has led to exponential growth in DCs over the last few decades, making ACs a largely forgotten concept. However, as described by the impending slow-down of Moore鈥檚 law, the performance of DCs is no longer improving exponentially, as DCs are approaching clock speed, power dissipation, and transistor density limits. This research explores the possibility of employing AC concepts, albeit using modern IC technologies at radio frequency (RF) bandwidths, to obtain additional performance from existing IC platforms. Combining analog circuits with modern digital processors to perform arithmetic operations would make the computation potentially faster and more energy-efficient. Two AC techniques are explored for computing the approximate solutions of linear and nonlinear partial differential equations (PDEs), and they were verified by designing ACs for solving Maxwell\u27s and wave equations. The designs were simulated in Cadence Spectre for different boundary conditions. The accuracies of the ACs were compared with finite-deference time-domain (FDTD) reference techniques. The objective of this dissertation is to design software-defined ACs with complementary digital logic to perform approximate computations at speeds that are several orders of magnitude greater than competing methods. ACs trade accuracy of the computation for reduced power and increased throughput. Recent examples of ACs are accurate but have less than 25 kHz of analog bandwidth (Fcompute) for continuous-time (CT) operations. In this dissertation, a special-purpose AC, which has Fcompute = 30 MHz (an equivalent update rate of 625 MHz) at a power consumption of 200 mW, is presented. The proposed AC employes 180 nm CMOS technology and evaluates the approximate CT solution of the 1-D wave equation in space and time. The AC is 100x, 26x, 2.8x faster when compared to the MATLAB- and C-based FDTD solvers running on a computer, and systolic digital implementation of FDTD on a Xilinx RF-SoC ZCU1275 at 900 mW (x15 improvement in power-normalized performance compared to RF-SoC), respectively
    corecore