Search CORE

458 research outputs found

A high-accuracy optical linear algebra processor for finite element applications

Author: Casasent D.
Taylor B. K.
Publication venue
Publication date
Field of study

Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced

Parallel alogorithms for MIMD parallel computers

Author: Hilal M. Yousif (7169864)
Publication venue
Publication date: 01/01/1986
Field of study

This thesis mainly covers the design and analysis of asynchronous parallel algorithms that can be run on MIMD (Multiple Instruction Multiple Data) parallel computers, in particular the NEPTUNE system at Loughborough University. Initially the fundamentals of parallel computer architectures are introduced with different parallel architectures being described and compared. The principles of parallel programming and the design of parallel algorithms are also outlined. Also the main characteristics of the 4 processor MIMD NEPTUNE system are presented, and performance indicators, i.e. the speed-up and the efficiency factors are defined for the measurement of parallelism in a given system. Both numerical and non-numerical algorithms are covered in the thesis. In the numerical solution of partial differential equations, a new parallel 9-point block iterative method is developed. Here, the organization of the blocks is done in such a way that each process contains its own group of 9 points on the network, therefore, they can be run in parallel. The parallel implementation of both 9-point and 4- point block iterative methods were programmed using natural and redblack ordering with synchronous and asynchronous approaches. The results obtained for these different implementations were compared and analysed. Next the parallel version of the A.G.E. (Alternating Group Explicit) method is developed in which the explicit nature of the difference equation is revealed and exploited when applied to derive the solution of both linear and non-linear 2-point boundary value problems. Two strategies have been used in the implementation of the parallel A.G.E. method using the synchronous and asynchronous approaches. The results from these implementations were compared. Also for comparison reasons the results obtained from the parallel A.G.E. were compared with the ~ corresponding results obtained from the parallel versions of the Jacobi, Gauss-Seidel and S.O.R. methods. Finally, a computational complexity analysis of the parallel A.G.E. algorithms is included. In the area of non-numeric algorithms, the problems of sorting and searching were studied. The sorting methods which were investigated was the shell and the digit sort methods. with each method different parallel strategies and approaches were used and compared to find the best results which can be obtained on the parallel machine. In the searching methods, the sequential search algorithm in an unordered table and the binary search algorithms were investigated and implemented in parallel with a presentation of the results. Finally, a complexity analysis of these methods is presented. The thesis concludes with a chapter summarizing the main results

Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads

Author: Asgari Bahar
Dierberger Joshua
Hadidi Ramyad
Kim Hyesoon
Marfatia Amaan
Steinichen Charlotte
Publication venue
Publication date: 18/10/2021
Field of study

Sparse matrices are the key ingredients of several application domains, from scientific computation to machine learning. The primary challenge with sparse matrices has been efficiently storing and transferring data, for which many sparse formats have been proposed to significantly eliminate zero entries. Such formats, essentially designed to optimize memory footprint, may not be as successful in performing faster processing. In other words, although they allow faster data transfer and improve memory bandwidth utilization -- the classic challenge of sparse problems -- their decompression mechanism can potentially create a computation bottleneck. Not only is this challenge not resolved, but also it becomes more serious with the advent of domain-specific architectures (DSAs), as they intend to more aggressively improve performance. The performance implications of using various formats along with DSAs, however, has not been extensively studied by prior work. To fill this gap of knowledge, we characterize the impact of using seven frequently used sparse formats on performance, based on a DSA for sparse matrix-vector multiplication (SpMV), implemented on an FPGA using high-level synthesis (HLS) tools, a growing and popular method for developing DSAs. Seeking a fair comparison, we tailor and optimize the HLS implementation of decompression for each format. We thoroughly explore diverse metrics, including decompression overhead, latency, balance ratio, throughput, memory bandwidth utilization, resource utilization, and power consumption, on a variety of real-world and synthetic sparse workloads.Comment: 11 pages, 14 figures, 2 table

arXiv.org e-Print Archive

Combining Synthesis of Cardiorespiratory Signals and Artifacts with Deep Learning for Robust Vital Sign Estimation

Author: Silva Diogo Filipe Pereira Fontes Fernandes
Publication venue
Publication date: 01/01/2019
Field of study

Healthcare has been remarkably morphing on the account of Big Data. As Machine Learning (ML) consolidates its place in simpler clinical chores, more complex Deep Learning (DL) algorithms have struggled to keep up, despite their superior capabilities. This is mainly attributed to the need for large amounts of data for training, which the scientific community is unable to satisfy. The number of promising DL algorithms is considerable, although solutions directly targeting the shortage of data lack. Currently, dynamical generative models are the best bet, but focus on single, classical modalities and tend to complicate significantly with the amount of physiological effects they can simulate. This thesis aims at providing and validating a framework, specifically addressing the data deficit in the scope of cardiorespiratory signals. Firstly, a multimodal statistical synthesizer was designed to generate large, annotated artificial signals. By expressing data through coefficients of pre-defined, fitted functions and describing their dependence with Gaussian copulas, inter- and intra-modality associations were learned. Thereafter, new coefficients are sampled to generate artificial, multimodal signals with the original physiological dynamics. Moreover, normal and pathological beats along with artifacts were included by employing Markov models. Secondly, a convolutional neural network (CNN) was conceived with a novel sensor-fusion architecture and trained with synthesized data under real-world experimental conditions to evaluate how its performance is affected. Both the synthesizer and the CNN not only performed at state of the art level but also innovated with multiple types of generated data and detection error improvements, respectively. Cardiorespiratory data augmentation corrected performance drops when not enough data is available, enhanced the CNN’s ability to perform on noisy signals and to carry out new tasks when introduced to, otherwise unavailable, types of data. Ultimately, the framework was successfully validated showing potential to leverage future DL research on Cardiology into clinical standards

Continuous-time Algorithms and Analog Integrated Circuits for Solving Partial Differential Equations

Author: Galabada Kankanamge Nilan Udayanga
Publication venue: FIU Digital Commons
Publication date: 12/11/2019
Field of study

Analog computing (AC) was the predominant form of computing up to the end of World War II. The invention of digital computers (DCs) followed by developments in transistors and thereafter integrated circuits (IC), has led to exponential growth in DCs over the last few decades, making ACs a largely forgotten concept. However, as described by the impending slow-down of Moore’s law, the performance of DCs is no longer improving exponentially, as DCs are approaching clock speed, power dissipation, and transistor density limits. This research explores the possibility of employing AC concepts, albeit using modern IC technologies at radio frequency (RF) bandwidths, to obtain additional performance from existing IC platforms. Combining analog circuits with modern digital processors to perform arithmetic operations would make the computation potentially faster and more energy-efficient. Two AC techniques are explored for computing the approximate solutions of linear and nonlinear partial differential equations (PDEs), and they were verified by designing ACs for solving Maxwell\u27s and wave equations. The designs were simulated in Cadence Spectre for different boundary conditions. The accuracies of the ACs were compared with finite-deference time-domain (FDTD) reference techniques. The objective of this dissertation is to design software-defined ACs with complementary digital logic to perform approximate computations at speeds that are several orders of magnitude greater than competing methods. ACs trade accuracy of the computation for reduced power and increased throughput. Recent examples of ACs are accurate but have less than 25 kHz of analog bandwidth (Fcompute) for continuous-time (CT) operations. In this dissertation, a special-purpose AC, which has Fcompute = 30 MHz (an equivalent update rate of 625 MHz) at a power consumption of 200 mW, is presented. The proposed AC employes 180 nm CMOS technology and evaluates the approximate CT solution of the 1-D wave equation in space and time. The AC is 100x, 26x, 2.8x faster when compared to the MATLAB- and C-based FDTD solvers running on a computer, and systolic digital implementation of FDTD on a Xilinx RF-SoC ZCU1275 at 900 mW (x15 improvement in power-normalized performance compared to RF-SoC), respectively

Integrated Heart - Coupling multiscale and multiphysics models for the simulation of the cardiac function

Author: Aggarwal
Aggarwal
Alastruey
Alfio Quarteroni
Aliev
Allgower
Ambrosi
Andreianov
Arthurs
Arthurs
Ascher
Ashikaga
Asner
Astorino
Ausoni
Axelsson
Baaijens
Baccani
Badia
Ball
Baroli
Bayer
Bazilevs
Bazilevs
Bendahmane
Bendahmane
Bergmann
Bestel
Bhattacharya-Ghosh
Blanco
Blanco
Bode
Boulakia
Boulakia
Bourgault
Buchanan
Bueno-Orovio
Bueno-Orovio
Burman
Caldwell
Caruel
Chabiniok
Chacón Rebollo
Chadwick
Chambolle
Chan
Chapelle
Chapelle
Cheng
Cherry
Cherry
Cherubini
Chnafa
Chung
Clay
Clayton
Clayton
Codina
Colli Franzone
Colli Franzone
Colli Franzone
Colli Franzone
Colli Franzone
Colman
Constantino
Cookson
Cortes
Crosetto
Deparis
Di Blasio
Dickopf
Dominguez
Drach
Driessen
Dupraz
Ebin
Eisenstat
Elman
Eriksson
Evangelista
Fedele
Fenton
Fenton
Fenton
Fenton
Filippi
Fink
FitzHugh
Formaggia
Formaggia
Formaggia
Forti
Fritz
Galeotti
Gatica
Gatica
Gee
Gee
Gizzi
Gizzi
Goldberger
Golob
Goriely
Grinberg
Guccione
Guevara
Gurev
Göktepe
Göktepe
Göktepe
Hadjicharalambous
Helfenstein
Hill
Hille
Holzapfel
Holzer
Hsu
Hughes
Hughes
Hughes
Hundsdorfer
Hurtado
Hurtado
Huxley
Islam
Jansen
Janz
Jones
Jung
Kamensky
Katz
Keener
Keener
Keldermann
Kerckhoffs
Kerckhoffs
Keyes
Kim
Knoll
Kovács
Krause
Krejcčí
Krishnamoorthi
Kuijpers
Kuzmin
Küttler
Laadhari
Lamichhane
Lanconelli
Land
Land
Lassila
Le Tallec
Lee
LeGrice
Lekadir
Liang
Ma
Maceira
Maday
Marchesseau
Marsh
Matano
Matthies
Mehlhorn
Michler
Mihalef
Mills
Moghadam
Moireau
Morganti
Motlagh
Mynard
Nagler
Nakamura
Nardinocchi
Nash
Nash
Negroni
Niederer
Nobile
Nordsletten
Palladino
Pandolfi
Patelli
Pathmanathan
Pathmanathan
Pathmanathan
Pedrizzetti
Peskin
Peskin
Peskin
Pezzuto
Piechór
Pitt-Francis
Plank
Pope
Potse
Pravdin
Qu
Quarteroni
Quarteroni
Quarteroni
Quarteroni
Rademakers
Ricardo Ruiz-Baier
Rice
Richards
Romero
Rossi
Rossi
Rossi
Rousseau
Ruiz-Baier
Ruiz-Baier
Rush
Sachs
Sagaut
Sahli Costabal
Sainte-Marie
Sansour
Sato
Schenkel
Schield
Scovazzi
Seo
Sermesant
Sermesant
Severi
Shamanskii
Shi
Shi
Simone Rossi
Spiteri
Stergiopulos
Stergiopulos
Streeter
Strobeck
Su
Sugiura
Sundnes
Sundnes
Sundnes
Sung
Tagliabue
Tagliabue
Tang
Ten Tusscher
Tobón
Toni Lassila
Torrent-Guasp
Torrent-Guasp
Toselli
Trayanova
Trayanova
Usyk
Varga
Vedula
Veneroni
Vergara
Vigmond
Vigmond
Vincent
Votta
Vázquez
Walcott
Walker
Wang
Washio
Watanabe
Watanabe
Westerhof
Whiteley
Winfree
Wong
Xiao
Zannad
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Mathematical modelling of the human heart and its function can expand our understanding of various cardiac diseases, which remain the most common cause of death in the developed world. Like other physiological systems, the heart can be understood as a complex multiscale system involving interacting phenomena at the molecular, cellular, tissue, and organ levels. This article addresses the numerical modelling of many aspects of heart function, including the interaction of the cardiac electrophysiology system with contractile muscle tissue, the sub-cellular activation-contraction mechanisms, as well as the hemodynamics inside the heart chambers. Resolution of each of these sub-systems requires separate mathematical analysis and specially developed numerical algorithms, which we review in detail. By using specific sub-systems as examples, we also look at systemic stability, and explain for example how physiological concepts such as microscopic force generation in cardiac muscle cells, translate to coupled systems of differential equations, and how their stability properties influence the choice of numerical coupling algorithms. Several numerical examples illustrate three fundamental challenges of developing multiphysics and multiscale numerical models for simulating heart function, namely: (i) the correct upscaling from single-cell models to the entire cardiac muscle, (ii) the proper coupling of electrophysiology and tissue mechanics to simulate electromechanical feedback, and (iii) the stable simulation of ventricular hemodynamics during rapid valve opening and closure

Archivio istituzionale della ricerca - Politecnico di Milano

Oxford University Research Archive

Digital Filter Design Using Improved Teaching-Learning-Based Optimization

Author: Zhang Miao
Publication venue: 'University of Windsor Leddy Library'
Publication date: 12/09/2019
Field of study

Digital filters are an important part of digital signal processing systems. Digital filters are divided into finite impulse response (FIR) digital filters and infinite impulse response (IIR) digital filters according to the length of their impulse responses. An FIR digital filter is easier to implement than an IIR digital filter because of its linear phase and stability properties. In terms of the stability of an IIR digital filter, the poles generated in the denominator are subject to stability constraints. In addition, a digital filter can be categorized as one-dimensional or multi-dimensional digital filters according to the dimensions of the signal to be processed. However, for the design of IIR digital filters, traditional design methods have the disadvantages of easy to fall into a local optimum and slow convergence. The Teaching-Learning-Based optimization (TLBO) algorithm has been proven beneficial in a wide range of engineering applications. To this end, this dissertation focusses on using TLBO and its improved algorithms to design five types of digital filters, which include linear phase FIR digital filters, multiobjective general FIR digital filters, multiobjective IIR digital filters, two-dimensional (2-D) linear phase FIR digital filters, and 2-D nonlinear phase FIR digital filters. Among them, linear phase FIR digital filters, 2-D linear phase FIR digital filters, and 2-D nonlinear phase FIR digital filters use single-objective type of TLBO algorithms to optimize; multiobjective general FIR digital filters use multiobjective non-dominated TLBO (MOTLBO) algorithm to optimize; and multiobjective IIR digital filters use MOTLBO with Euclidean distance to optimize. The design results of the five types of filter designs are compared to those obtained by other state-of-the-art design methods. In this dissertation, two major improvements are proposed to enhance the performance of the standard TLBO algorithm. The first improvement is to apply a gradient-based learning to replace the TLBO learner phase to reduce approximation error(s) and CPU time without sacrificing design accuracy for linear phase FIR digital filter design. The second improvement is to incorporate Manhattan distance to simplify the procedure of the multiobjective non-dominated TLBO (MOTLBO) algorithm for general FIR digital filter design. The design results obtained by the two improvements have demonstrated their efficiency and effectiveness