160 research outputs found
Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator
Neural networks have demonstrated their outstanding performance in a wide range of tasks. Specifically recurrent architectures based on long-short term memory (LSTM) cells have manifested excellent capability to model time dependencies in real-world data. However, standard recurrent architectures cannot estimate their uncertainty which is essential for safety-critical applications such as in medicine. In contrast, Bayesian recurrent neural networks (RNNs) are able to provide uncertainty estimation with improved accuracy. Nonetheless, Bayesian RNNs are computationally and memory demanding, which limits their practicality despite their advantages. To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs. To further improve the overall algorithmic-hardware performance, a co-design framework is proposed to explore the most fitting algorithmic-hardware configurations for Bayesian RNNs. We conduct extensive experiments on healthcare applications to demonstrate the improvement of our design and the effectiveness of our framework. Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency. To the best of our knowledge, this is the first work targeting acceleration of Bayesian RNNs on FPGAs
Review: Recent Directions in ECG-FPGA Researches
لقد شهدت السنوات القليلة الماضية اهتماماً متزايداً نحو استخدام مصفوفة البوابات المنطقية القابلة للبرمجة FPGA في التطبيقات المختلفة. لقد أدى التقدم الحاصل في مرونة التعامل مع الموارد بالاضافة الى الزيادة في سرعة الاداء وانخفاض الثمن للـ FPGA وكذلك الاستهلاك القليل للطاقة الى هذا الاهتمام المتزايد بالـ FPGA. ان استخدام الـ FPGA في مجالات الطب والصحة يهدف بشكل عام الى استبدال اجهزة المراقبة الطبية كبيرة الحجم وغالية الثمن باخرى أصغر حجماً مع امكانية تصميمها لكي تكون اجهزة محمولة اعتماداً على مرونة التصميم التي يوفرها الـ FPGA. إنصب الاهتمام في العديد من البحوث الحالية على استخدام نظام FPGA لمعالجة الجوانب المتعلقة بإشارة تخطيط القلب وذلك لتوفير التحسينات في الاداء وزيادة السرعة بالاضافة الى أيجاد وإقتراح افكار جديدة لمثل هذه التطبيقات. ان هذا البحث يوفر نظرة عامة عن الاتجاهات الحالية في انظمة ECG-FPGA.The last few years witnessed an increased interest in utilizing field programmable gate array (FPGA) for a variety of applications. This utilizing derived mostly by the advances in the FPGA flexible resource configuration, increased speed, relatively low cost and low energy consumption. The introduction of FPGA in medicine and health care field aim generally to replace costly and usually bigger medical monitoring and diagnostic equipment with much smaller and possibly portable systems based on FPGA that make use of the design flexibility of FPGA. Many recent researches focus on FPGA systems to deal with the well-known yet very important electrocardiogram (ECG) signal aspects to provide acceleration and improvement in the performance as well as finding and proposing new ideas for such implementations. The recent directions in ECG-FPGA are introduced in this paper
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic
processors, new opportunities are emerging for applying deep and Spiking Neural
Network (SNN) algorithms to healthcare and biomedical applications at the edge.
This can facilitate the advancement of the medical Internet of Things (IoT)
systems and Point of Care (PoC) devices. In this paper, we provide a tutorial
describing how various technologies ranging from emerging memristive devices,
to established Field Programmable Gate Arrays (FPGAs), and mature Complementary
Metal Oxide Semiconductor (CMOS) technology can be used to develop efficient DL
accelerators to solve a wide variety of diagnostic, pattern recognition, and
signal processing problems in healthcare. Furthermore, we explore how spiking
neuromorphic processors can complement their DL counterparts for processing
biomedical signals. After providing the required background, we unify the
sparsely distributed research on neural network and neuromorphic hardware
implementations as applied to the healthcare domain. In addition, we benchmark
various hardware platforms by performing a biomedical electromyography (EMG)
signal processing task and drawing comparisons among them in terms of inference
delay and energy. Finally, we provide our analysis of the field and share a
perspective on the advantages, disadvantages, challenges, and opportunities
that different accelerators and neuromorphic processors introduce to healthcare
and biomedical domains. This paper can serve a large audience, ranging from
nanoelectronics researchers, to biomedical and healthcare practitioners in
grasping the fundamental interplay between hardware, algorithms, and clinical
adoption of these tools, as we shed light on the future of deep networks and
spiking neuromorphic processing systems as proponents for driving biomedical
circuits and systems forward.Comment: Submitted to IEEE Transactions on Biomedical Circuits and Systems (21
pages, 10 figures, 5 tables
Demonstrating Analog Inference on the BrainScaleS-2 Mobile System
We present the BrainScaleS-2 mobile system as a compact analog inference
engine based on the BrainScaleS-2 ASIC and demonstrate its capabilities at
classifying a medical electrocardiogram dataset. The analog network core of the
ASIC is utilized to perform the multiply-accumulate operations of a
convolutional deep neural network. At a system power consumption of 5.6W, we
measure a total energy consumption of 192uJ for the ASIC and achieve a
classification time of 276us per electrocardiographic patient sample. Patients
with atrial fibrillation are correctly identified with a detection rate of
(93.70.7)% at (14.01.0)% false positives. The system is directly
applicable to edge inference applications due to its small size, power
envelope, and flexible I/O capabilities. It has enabled the BrainScaleS-2 ASIC
to be operated reliably outside a specialized lab setting. In future
applications, the system allows for a combination of conventional machine
learning layers with online learning in spiking neural networks on a single
neuromorphic platform
Wearable Technologies and AI at the Far Edge for Chronic Heart Failure Prevention and Management: A Systematic Review and Prospects
Smart wearable devices enable personalized at-home healthcare by unobtrusively collecting patient health data and facilitating the development of intelligent platforms to support patient care and management. The accurate analysis of data obtained from wearable devices is crucial for interpreting and contextualizing health data and facilitating the reliable diagnosis and management of critical and chronic diseases. The combination of edge computing and artificial intelligence has provided real-time, time-critical, and privacy-preserving data analysis solutions. However, based on the envisioned service, evaluating the additive value of edge intelligence to the overall architecture is essential before implementation. This article aims to comprehensively analyze the current state of the art on smart health infrastructures implementing wearable and AI technologies at the far edge to support patients with chronic heart failure (CHF). In particular, we highlight the contribution of edge intelligence in supporting the integration of wearable devices into IoT-aware technology infrastructures that provide services for patient diagnosis and management. We also offer an in-depth analysis of open challenges and provide potential solutions to facilitate the integration of wearable devices with edge AI solutions to provide innovative technological infrastructures and interactive services for patients and doctors
Arrhythmia Classifier Based on Ultra-Lightweight Binary Neural Network
Reasonably and effectively monitoring arrhythmias through ECG signals has
significant implications for human health. With the development of deep
learning, numerous ECG classification algorithms based on deep learning have
emerged. However, most existing algorithms trade off high accuracy for complex
models, resulting in high storage usage and power consumption. This also
inevitably increases the difficulty of implementation on wearable Artificial
Intelligence-of-Things (AIoT) devices with limited resources. In this study, we
proposed a universally applicable ultra-lightweight binary neural network(BNN)
that is capable of 5-class and 17-class arrhythmia classification based on ECG
signals. Our BNN achieves 96.90% (full precision 97.09%) and 97.50% (full
precision 98.00%) accuracy for 5-class and 17-class classification,
respectively, with state-of-the-art storage usage (3.76 KB and 4.45 KB).
Compared to other binarization works, our approach excels in supporting two
multi-classification modes while achieving the smallest known storage space.
Moreover, our model achieves optimal accuracy in 17-class classification and
boasts an elegantly simple network architecture. The algorithm we use is
optimized specifically for hardware implementation. Our research showcases the
potential of lightweight deep learning models in the healthcare industry,
specifically in wearable medical devices, which hold great promise for
improving patient outcomes and quality of life. Code is available on:
https://github.com/xpww/ECG_BNN_NetComment: 6 pages, 3 figure
Neuromorphic computing based on stochastic spiking reservoir for heartbeat classification
Heart disease is the leading cause of mortality worldwide. The precise heartbeat classification usually requires a higher number of extracted features and heartbeats of the same class may also behave differently in patients. This will lead to computation overhead and challenges in hardware implementation due to the large number of nodes utilized in reservoir computing (RC) networks. In this work, a reservoir computing-based stochastic spiking neural network (SSNN) has been proposed for heartbeat rhythm classification, enabling a patient adaptable and more efficient hardware implementation with low computation overhead caused by minimum extracted features. Only a single feature is employed in template matching to achieve patient adaptability with minimal computation overhead. The single feature, QRS complexes, was extracted and fed into the neural reservoir with 20 neurons in a cyclic topology for arrhythmia similarity calculation and classification. 43 recordings of Electrocardiogram (ECG) signals that included both normal and arrhythmic beats from MIT-BIH arrhythmia database obtained from Physio-Net were used in this work. The proposed stochastic spiking reservoir achieves a sensitivity of 99.6% and an accuracy of 96.91%, signifying that the system is accurate and efficient in classifying normal and abnormal arrhythmias
Simulation and implementation of novel deep learning hardware architectures for resource constrained devices
Corey Lammie designed mixed signal memristive-complementary metal–oxide–semiconductor (CMOS) and field programmable gate arrays (FPGA) hardware architectures, which were used to reduce the power and resource requirements of Deep Learning (DL) systems; both during inference and training. Disruptive design methodologies, such as those explored in this thesis, can be used to facilitate the design of next-generation DL systems
Reconfigurable acceleration of Recurrent Neural Networks
Recurrent Neural Networks (RNNs) have been successful in a wide range of applications involving temporal sequences such as natural language processing, speech recognition and video analysis. However, RNNs often require a significant amount of memory and computational resources. In addition, the recurrent nature and data dependencies in RNN computations can lead to system stall, resulting in low throughput and high latency.
This work describes novel parallel hardware architectures for accelerating RNN inference using Field-Programmable Gate Array (FPGA) technology, which considers the data dependencies and high computational costs of RNNs.
The first contribution of this thesis is a latency-hiding architecture that utilizes column-wise matrix-vector multiplication instead of the conventional row-wise operation to eliminate data dependencies and improve the throughput of RNN inference designs. This architecture is further enhanced by a configurable checkerboard tiling strategy which allows large dimensions of weight matrices, while supporting element-based parallelism and vector-based parallelism. The presented reconfigurable RNN designs show significant speedup over CPU, GPU, and other FPGA designs.
The second contribution of this thesis is a weight reuse approach for large RNN models with weights stored in off-chip memory, running with a batch size of one. A novel blocking-batching strategy is proposed to optimize the throughput of large RNN designs on FPGAs by reusing the RNN weights. Performance analysis is also introduced to enable FPGA designs to achieve the best trade-off between area, power consumption and performance. Promising power efficiency improvement has been achieved in addition to speeding up over CPU and GPU designs.
The third contribution of this thesis is a low latency design for RNNs based on a partially-folded hardware architecture. It also introduces a technique that balances initiation interval of multi-layer RNN inferences to increase hardware efficiency and throughput while reducing latency. The approach is evaluated on a variety of applications, including gravitational wave detection and Bayesian RNN-based ECG anomaly detection.
To facilitate the use of this approach, we open source an RNN template which enables the generation of low-latency FPGA designs with efficient resource utilization using high-level synthesis tools.Open Acces
- …