Search CORE

265 research outputs found

Solution of partial differential equations on vector and parallel computers

Author: Ortega J. M.
Voigt R. G.
Publication venue
Publication date
Field of study

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

NASA Technical Reports Server

Research in applied mathematics, numerical analysis, and computer science

Author
Publication venue
Publication date
Field of study

Research conducted at the Institute for Computer Applications in Science and Engineering (ICASE) in applied mathematics, numerical analysis, and computer science is summarized and abstracts of published reports are presented. The major categories of the ICASE research program are: (1) numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; (2) control and parameter identification; (3) computational problems in engineering and the physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and (4) computer systems and software, especially vector and parallel computers

NASA Technical Reports Server

Continuous-time Algorithms and Analog Integrated Circuits for Solving Partial Differential Equations

Author: Galabada Kankanamge Nilan Udayanga
Publication venue: FIU Digital Commons
Publication date: 12/11/2019
Field of study

Analog computing (AC) was the predominant form of computing up to the end of World War II. The invention of digital computers (DCs) followed by developments in transistors and thereafter integrated circuits (IC), has led to exponential growth in DCs over the last few decades, making ACs a largely forgotten concept. However, as described by the impending slow-down of Moore’s law, the performance of DCs is no longer improving exponentially, as DCs are approaching clock speed, power dissipation, and transistor density limits. This research explores the possibility of employing AC concepts, albeit using modern IC technologies at radio frequency (RF) bandwidths, to obtain additional performance from existing IC platforms. Combining analog circuits with modern digital processors to perform arithmetic operations would make the computation potentially faster and more energy-efficient. Two AC techniques are explored for computing the approximate solutions of linear and nonlinear partial differential equations (PDEs), and they were verified by designing ACs for solving Maxwell\u27s and wave equations. The designs were simulated in Cadence Spectre for different boundary conditions. The accuracies of the ACs were compared with finite-deference time-domain (FDTD) reference techniques. The objective of this dissertation is to design software-defined ACs with complementary digital logic to perform approximate computations at speeds that are several orders of magnitude greater than competing methods. ACs trade accuracy of the computation for reduced power and increased throughput. Recent examples of ACs are accurate but have less than 25 kHz of analog bandwidth (Fcompute) for continuous-time (CT) operations. In this dissertation, a special-purpose AC, which has Fcompute = 30 MHz (an equivalent update rate of 625 MHz) at a power consumption of 200 mW, is presented. The proposed AC employes 180 nm CMOS technology and evaluates the approximate CT solution of the 1-D wave equation in space and time. The AC is 100x, 26x, 2.8x faster when compared to the MATLAB- and C-based FDTD solvers running on a computer, and systolic digital implementation of FDTD on a Xilinx RF-SoC ZCU1275 at 900 mW (x15 improvement in power-normalized performance compared to RF-SoC), respectively

DigitalCommons@Florida International University

Accelerating Linear Algebra and Machine Learning Kernels on a Massively Parallel Reconfigurable Architecture

Author
Publication venue
Publication date: 01/01/2019
Field of study

abstract: This thesis presents efficient implementations of several linear algebra kernels, machine learning kernels and a neural network based recommender systems engine onto a massively parallel reconfigurable architecture, Transformer. The linear algebra kernels include Triangular Matrix Solver (TRSM), LU Decomposition (LUD), QR Decomposition (QRD), and Matrix Inversion. The machine learning kernels include an LSTM (Long Short Term Memory) cell, and a GRU (gated Recurrent Unit) cell used in recurrent neural networks. The neural network based recommender systems engine consists of multiple kernels including fully connected layers, embedding layer, 1-D batchnorm, Adam optimizer, etc. Transformer is a massively parallel reconfigurable multicore architecture designed at the University of Michigan. The Transformer configuration considered here is 4 tiles and 16 General Processing Elements (GPEs) per tile. It supports a two level cache hierarchy where the L1 and L2 caches can operate in shared (S) or private (P) modes. The architecture was modeled using Gem5 and cycle accurate simulations were done to evaluate the performance in terms of execution times, giga-operations per second per Watt (GOPS/W), and giga-floating-point-operations per second per Watt (GFLOPS/W). This thesis shows that for linear algebra kernels, each kernel achieves high performance for a certain cache mode and that this cache mode can change when the matrix size changes. For instance, for smaller matrix sizes, L1P, L2P cache mode is best for TRSM, while L1S, L2S is the best cache mode for LUD, and L1P, L2S is the best for QRD. For each kernel, the optimal cache mode changes when the matrix size is increased. For instance, for TRSM, the L1P, L2P cache mode is best for smaller matrix sizes (

N=64, 128, 256, 512

) and it changes to L1S, L2P for larger matrix sizes (

N=1024

). For machine learning kernels, L1P, L2P is the best cache mode for all network parameter sizes. Gem5 simulations show that the peak performance for TRSM, LUD, QRD and Matrix Inverse in the 14nm node is 97.5, 59.4, 133.0 and 83.05 GFLOPS/W, respectively. For LSTM and GRU, the peak performance is 44.06 and 69.3 GFLOPS/W. The neural network based recommender system was implemented in L1S, L2S cache mode. It includes a forward pass and a backward pass and is significantly more complex in terms of both computational complexity and data movement. The most computationally intensive block is the fully connected layer followed by Adam optimizer. The overall performance of the recommender systems engine is 54.55 GFLOPS/W and 169.12 GOPS/W.Dissertation/ThesisMasters Thesis Electrical Engineering 201

ASU Digital Repository

Design and Implementation of Hardware Accelerators for Neural Processing Applications

Author: Mayannavar Shilpa
Wali Uday
Publication venue
Publication date: 24/01/2024
Field of study

Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations

arXiv.org e-Print Archive

Development, Validation, and Clinical Application of a Numerical Model for Pulse Wave Velocity Propagation in a Cardiovascular System with Application to Noninvasive Blood Pressure Measurements

Author: Lillie Jeffrey S
Publication venue: RIT Scholar Works
Publication date: 05/01/2017
Field of study

High blood pressure blood pressure is an important risk factor for cardiovascular disease and affects almost one-third of the U.S. adult population. Historical cuff-less non-invasive techniques used to monitor blood pressure are not accurate and highlight the need for first principal models. The first model is a one-dimensional model for pulse wave velocity (PWV) propagation in compliant arteries that accounts for nonlinear fluids in a linear elastic thin walled vessel. The results indicate an inverse quadratic relationship (R^2=.99) between ejection time and PWV, with ejection time dominating the PWV shifts (12%). The second model predicts the general relationship between PWV and blood pressure with a rigorous account of nonlinearities in the fluid dynamics, blood vessel elasticity, and finite dynamic deformation of a membrane type thin anisotropic wall. The nonlinear model achieves the best match with the experimental data. To retrieve individual vascular information of a patient, the inverse problem of hemodynamics is presented, calculating local orthotropic hyperelastic properties of the arterial wall. The final model examines the impact of the thick arterial wall with different material properties in the radial direction. For a hypertensive subject the thick wall model provides improved accuracy up to 8.4% in PWV prediction over its thin wall counterpart. This translates to nearly 20% improvement in blood pressure prediction based on a PWV measure. The models highlight flow velocity is additive to the classic pressure wave, suggesting flow velocity correction may be important for cuff-less, non-invasive blood pressure measures. Systolic flow correction of the measured PWV improves the R2 correlation to systolic blood pressure from 0.81 to 0.92 for the mongrel dog study, and 0.34 to 0.88 for the human subjects study. The algorithms and insight resulting from this work can enable the development of an integrated microsystem for cuff-less, non-invasive blood pressure monitoring

RIT Scholar Works

Digital Twin of Cardiovascular Systems

Author: NEERAJ CHAKSHU
Publication venue: 'Swansea University'
Publication date
Field of study

Patient specific modelling using numerical methods is widely used in understanding diseases and disorders. It produces medical analysis based on the current state of patient’s health. Concurrently, as a parallel development, emerging data driven Artificial Intelligence (AI) has accelerated patient care. It provides medical analysis using algorithms that rely upon knowledge from larger human population data. AI systems are also known to have the capacity to provide a prognosis with overallaccuracy levels that are better than those provided by trained professionals. When these two independent and robust methods are combined, the concept of human digital twins arise. A Digital Twin is a digital replica of any given system or process. They combine knowledge from general data with subject oriented knowledge for past, current and future analyses and predictions. Assumptions made during numerical modelling are compensated using knowledge from general data. For humans, they can provide an accurate current diagnosis as well as possible future outcomes. This allows forprecautions to be taken so as to avoid further degradation of patient’s health.In this thesis, we explore primary forms of human digital twins for the cardiovascular system, that are capable of replicating various aspects of the cardiovascular system using different types of data. Since different types of medical data are available, such as images, videos and waveforms, and the kinds of analysis required may be offline or online in nature, digital twin systems should be uniquely designed to capture each type of data for different kinds of analysis. Therefore, passive, active and semi-active digital twins, as the three primary forms of digital twins, for different kinds of applications are proposed in this thesis. By the virtue of applications and the kind of data involved ineach of these applications, the performance and importance of human digital twins for the cardiovascular system are demonstrated. The idea behind these twins is to allow for the application of the digital twin concept for online analysis, offline analysis or a combination of the two in healthcare. In active digital twins active data, such as signals, is analysed online in real-time; in semi-active digital twin some of the components being analysed are active but the analysis itself is carried out offline; and finally, passive digital twins perform offline analysis of data that involves no active component.For passive digital twin, an automatic workflow to calculate Fractional Flow Reserve (FFR) is proposed and tested on a cohort of 25 patients with acceptable results. For semi-active digital twin, detection of carotid stenosis and its severity using face videos is proposed and tested with satisfactory results from one carotid stenosis patient and a small cohort of healthy adults. Finally, for the active digital twin, an enabling model is proposed using inverse analysis and its application in the detection of Abdominal Aortic Aneurysm (AAA) and its severity, with the help of a virtual patient database. This enabling model detected artificially generated AAA with an accuracy as high as 99.91% and classified its severity with acceptable accuracy of 97.79%. Further, for active digital twin, a truly active model is proposed for continuous cardiovascular state monitoring. It is tested on a small cohort of five patients from a publicly available database for three 10-minute periods, wherein this model satisfactorily replicated and forecasted patients’ cardiovascular state. In addition to the three forms of human digital twins for the cardiovascular system, an additional work on patient prioritisation in pneumonia patients for ITU care using data-driven digital twin is also proposed. The severity indices calculated by these models are assessed using the standard benchmark of Area Under Receiving Operating Characteristic Curve (AUROC). The results indicate that using these models, the ITU and mechanical ventilation can be prioritised correctly to an AUROC value as high as 0.89

Cronfa at Swansea University

Assessment of a Neural Network-Based Subspace MRI Reconstruction Method for Myocardial T1 Mapping Using Inversion-Recovery Radial FLASH

Author: FANTINATO CHIARA
Publication venue
Publication date: 23/10/2023
Field of study

openLa mappatura T1 del miocardio si è affermata come un promettente biomarker per la caratterizzazione non invasiva del muscolo cardiaco nell'ambito della risonanza magnetica cardiovascolare. Questo approccio ha il potenziale di sostituire la biopsia nella diagnosi di diverse condizioni patologiche del miocardio, come la fibrosi, l'accumulo di ferro o amiloidosi. Negli ultimi anni, il deep learning ha suscitato un crescente interesse per la ricostruzione delle immagini, portando a notevoli miglioramenti rispetto alle tecniche che richiedono la predefinizione dei parametri di regolarizzazione da parte dell'operatore, rendendo così il processo parzialmente soggettivo. Il miglioramento è reso possibile grazie alla capacità delle reti neurali di apprendere automaticamente le proprietà presenti nelle immagini del dataset utilizzato per il training. La presente tesi si focalizza sull'analisi di un nuovo metodo di ricostruzione subspaziale delle immagini di risonanza magnetica basato su reti neurali per la mappatura T1 del miocardio, che utilizza una sequenza chiamata single-shot inversion-recovery radial FLASH. È stata impiegata una rete neurale nota come NLINV-Net, la quale trae ispirazione dalla tecnica di ricostruzione delle immagini NLINV. NLINV-Net risolve il problema inverso non lineare per il parallel imaging srotolando l'iteratively regularized Gauss-Newton method e incorporando nel processo termini di regolarizzazione basati su reti neurali. La rete neurale ha appreso le correlazioni esistenti tra i singoli parametri codificati dalla sequenza FLASH in modo auto-supervisionato, ovvero senza richiedere un riferimento esterno. NLINV-Net ha dimostrato di superare NLINV per la precisione dei valori T1, producendo mappe T1 di alta qualità. Le mappe ottenute con NLINV-Net sono paragonabili a quelle ottenute con un altro metodo di riferimento, che combina parallel imaging e compressed sensing utilizzando la regolarizzazione l1-Wavelet nella risoluzione del problema lineare inverso per il parallel imaging. Il vantaggio di NLINV-Net rispetto al suddetto metodo di appoggio è quello di sbarazzarsi della predefinizione dei parametri di regolarizzazione da parte dell'operatore. In questo modo, NLINV-Net fornisce una solida base per la mappatura T1 del miocardio utilizzando la sequenza single-shot inversion-recovery radial FLASH.In cardiovascular MRI, myocardial T1 mapping provides an imaging biomarker for the non-invasive characterization of the myocardial tissue, with the potential to replace invasive biopsy for the diagnosis of several pathological heart muscle conditions such as fibrosis, iron overload, or amyloid infiltration. Over the last few years, deep learning has become increasingly appealing for image reconstruction to improve upon the commonly employed user-dependent regularization terms by automatically learning image properties from the training dataset. This thesis investigates a novel neural network-based subspace MRI reconstruction method for myocardial T1 mapping, which uses a single-shot inversion-recovery radial FLASH sequence. The neural network utilized in this study is NLINV-Net, which draws inspiration from the NLINV image reconstruction technique. NLINV-Net addresses the nonlinear inverse problem for parallel imaging by unrolling the iteratively regularized Gauss-Newton method and incorporating neural network-based regularization terms into the process. It learned in a self-supervised fashion, i.e., without a reference, correlations between the individual parameters encoded with the FLASH sequence, and, consequently, a well-tuned regularization. NLINV-Net outperformed NLINV in terms of T1 precision and generated high-quality T1 maps. The T1 maps computed using NLINV-Net were comparable to the ones obtained using another baseline method, which combines parallel imaging and compressed sensing using the l1-Wavelet regularization when solving the linear inverse problem for parallel imaging. In this case, the advantage of NLINV-Net is that it removes the subjective regularization parameter tuning that comes with the forenamed benchmark method. Thus, it provides an excellent basis for myocardial T1 mapping using a single-shot inversion-recovery radial FLASH sequence

Padua Thesis and Dissertation Archive