458 research outputs found

    A high-accuracy optical linear algebra processor for finite element applications

    Get PDF
    Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced

    Parallel alogorithms for MIMD parallel computers

    Get PDF
    This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results

    Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

    Full text link
    Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization -- the classic challenge of sparse problems -- their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used sparse formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and optimize the HLS implementation of decompression for each format. We thoroughly explore diverse metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.Comment: 11 pages, 14 figures, 2 table

    Combining Synthesis of Cardiorespiratory Signals and Artifacts with Deep Learning for Robust Vital Sign Estimation

    Get PDF
    Healthcare has been remarkably morphing on the account of Big Data. As Machine Learning (ML) consolidates its place in simpler clinical chores, more complex Deep Learning (DL) algorithms have struggled to keep up, despite their superior capabilities. This is mainly attributed to the need for large amounts of data for training, which the scientific community is unable to satisfy. The number of promising DL algorithms is considerable, although solutions directly targeting the shortage of data lack. Currently, dynamical generative models are the best bet, but focus on single, classical modalities and tend to complicate significantly with the amount of physiological effects they can simulate. This thesis aims at providing and validating a framework, specifically addressing the data deficit in the scope of cardiorespiratory signals. Firstly, a multimodal statistical synthesizer was designed to generate large, annotated artificial signals. By expressing data through coefficients of pre-defined, fitted functions and describing their dependence with Gaussian copulas, inter- and intra-modality associations were learned. Thereafter, new coefficients are sampled to generate artificial, multimodal signals with the original physiological dynamics. Moreover, normal and pathological beats along with artifacts were included by employing Markov models. Secondly, a convolutional neural network (CNN) was conceived with a novel sensor-fusion architecture and trained with synthesized data under real-world experimental conditions to evaluate how its performance is affected. Both the synthesizer and the CNN not only performed at state of the art level but also innovated with multiple types of generated data and detection error improvements, respectively. Cardiorespiratory data augmentation corrected performance drops when not enough data is available, enhanced the CNN’s ability to perform on noisy signals and to carry out new tasks when introduced to, otherwise unavailable, types of data. Ultimately, the framework was successfully validated showing potential to leverage future DL research on Cardiology into clinical standards

    Continuous-time Algorithms and Analog Integrated Circuits for Solving Partial Differential Equations

    Get PDF
    Analog computing (AC) was the predominant form of computing up to the end of World War II. The invention of digital computers (DCs) followed by developments in transistors and thereafter integrated circuits (IC), has led to exponential growth in DCs over the last few decades, making ACs a largely forgotten concept. However, as described by the impending slow-down of Moore’s law, the performance of DCs is no longer improving exponentially, as DCs are approaching clock speed, power dissipation, and transistor density limits. This research explores the possibility of employing AC concepts, albeit using modern IC technologies at radio frequency (RF) bandwidths, to obtain additional performance from existing IC platforms. Combining analog circuits with modern digital processors to perform arithmetic operations would make the computation potentially faster and more energy-efficient. Two AC techniques are explored for computing the approximate solutions of linear and nonlinear partial differential equations (PDEs), and they were verified by designing ACs for solving Maxwell\u27s and wave equations. The designs were simulated in Cadence Spectre for different boundary conditions. The accuracies of the ACs were compared with finite-deference time-domain (FDTD) reference techniques. The objective of this dissertation is to design software-defined ACs with complementary digital logic to perform approximate computations at speeds that are several orders of magnitude greater than competing methods. ACs trade accuracy of the computation for reduced power and increased throughput. Recent examples of ACs are accurate but have less than 25 kHz of analog bandwidth (Fcompute) for continuous-time (CT) operations. In this dissertation, a special-purpose AC, which has Fcompute = 30 MHz (an equivalent update rate of 625 MHz) at a power consumption of 200 mW, is presented. The proposed AC employes 180 nm CMOS technology and evaluates the approximate CT solution of the 1-D wave equation in space and time. The AC is 100x, 26x, 2.8x faster when compared to the MATLAB- and C-based FDTD solvers running on a computer, and systolic digital implementation of FDTD on a Xilinx RF-SoC ZCU1275 at 900 mW (x15 improvement in power-normalized performance compared to RF-SoC), respectively

    Integrated Heart - Coupling multiscale and multiphysics models for the simulation of the cardiac function

    Get PDF
    Mathematical modelling of the human heart and its function can expand our understanding of various cardiac diseases, which remain the most common cause of death in the developed world. Like other physiological systems, the heart can be understood as a complex multiscale system involving interacting phenomena at the molecular, cellular, tissue, and organ levels. This article addresses the numerical modelling of many aspects of heart function, including the interaction of the cardiac electrophysiology system with contractile muscle tissue, the sub-cellular activation-contraction mechanisms, as well as the hemodynamics inside the heart chambers. Resolution of each of these sub-systems requires separate mathematical analysis and specially developed numerical algorithms, which we review in detail. By using specific sub-systems as examples, we also look at systemic stability, and explain for example how physiological concepts such as microscopic force generation in cardiac muscle cells, translate to coupled systems of differential equations, and how their stability properties influence the choice of numerical coupling algorithms. Several numerical examples illustrate three fundamental challenges of developing multiphysics and multiscale numerical models for simulating heart function, namely: (i) the correct upscaling from single-cell models to the entire cardiac muscle, (ii) the proper coupling of electrophysiology and tissue mechanics to simulate electromechanical feedback, and (iii) the stable simulation of ventricular hemodynamics during rapid valve opening and closure

    Digital Filter Design Using Improved Teaching-Learning-Based Optimization

    Get PDF
    Digital filters are an important part of digital signal processing systems. Digital filters are divided into finite impulse response (FIR) digital filters and infinite impulse response (IIR) digital filters according to the length of their impulse responses. An FIR digital filter is easier to implement than an IIR digital filter because of its linear phase and stability properties. In terms of the stability of an IIR digital filter, the poles generated in the denominator are subject to stability constraints. In addition, a digital filter can be categorized as one-dimensional or multi-dimensional digital filters according to the dimensions of the signal to be processed. However, for the design of IIR digital filters, traditional design methods have the disadvantages of easy to fall into a local optimum and slow convergence. The Teaching-Learning-Based optimization (TLBO) algorithm has been proven beneficial in a wide range of engineering applications. To this end, this dissertation focusses on using TLBO and its improved algorithms to design five types of digital filters, which include linear phase FIR digital filters, multiobjective general FIR digital filters, multiobjective IIR digital filters, two-dimensional (2-D) linear phase FIR digital filters, and 2-D nonlinear phase FIR digital filters. Among them, linear phase FIR digital filters, 2-D linear phase FIR digital filters, and 2-D nonlinear phase FIR digital filters use single-objective type of TLBO algorithms to optimize; multiobjective general FIR digital filters use multiobjective non-dominated TLBO (MOTLBO) algorithm to optimize; and multiobjective IIR digital filters use MOTLBO with Euclidean distance to optimize. The design results of the five types of filter designs are compared to those obtained by other state-of-the-art design methods. In this dissertation, two major improvements are proposed to enhance the performance of the standard TLBO algorithm. The first improvement is to apply a gradient-based learning to replace the TLBO learner phase to reduce approximation error(s) and CPU time without sacrificing design accuracy for linear phase FIR digital filter design. The second improvement is to incorporate Manhattan distance to simplify the procedure of the multiobjective non-dominated TLBO (MOTLBO) algorithm for general FIR digital filter design. The design results obtained by the two improvements have demonstrated their efficiency and effectiveness
    • …
    corecore