104 research outputs found

    Kirby research group

    Get PDF
    poste

    Accelerating SNN Training with Stochastic Parallelizable Spiking Neurons

    Full text link
    Spiking neural networks (SNN) are able to learn spatiotemporal features while using less energy, especially on neuromorphic hardware. The most widely used spiking neuron in deep learning is the Leaky Integrate and Fire (LIF) neuron. LIF neurons operate sequentially, however, since the computation of state at time t relies on the state at time t-1 being computed. This limitation is shared with Recurrent Neural Networks (RNN) and results in slow training on Graphics Processing Units (GPU). In this paper, we propose the Stochastic Parallelizable Spiking Neuron (SPSN) to overcome the sequential training limitation of LIF neurons. By separating the linear integration component from the non-linear spiking function, SPSN can be run in parallel over time. The proposed approach results in performance comparable with the state-of-the-art for feedforward neural networks on the Spiking Heidelberg Digits (SHD) dataset, outperforming LIF networks while training 10 times faster and outperforming non-spiking networks with the same network architecture. For longer input sequences of 10000 time-steps, we show that the proposed approach results in 4000 times faster training, thus demonstrating the potential of the proposed approach to accelerate SNN training for very large datasets

    Electrocardiogram Analysis Reveals Ionic Current Dysregulation Relevant for Atrial Fibrillation

    Get PDF
    Antiarrhythmic drug choice for atrial fibrillation (AF) neglects the individual ionic current profile of the patient, even though it determines drug safety and efficacy. We hypothesize that the electrocardiogram (ECG) might contain information critical for pharmacological treatment personalization. Thus, this study aims to identify the extent of atrial ionic information embedded in the ECG, using multi-scale modeling and simulation. A dataset of 1,000 simulated ECGs was computed using a population of human-based whole-atria models with 200 individual ionic profiles and 5 different torso-atria orientations. A regression neural network was built to predict key atrial ionic conductances based on P- and Ta_a -wave biomarkers. The neural network predicted, with >80% precision, the density of seven ionic currents relevant for AF, namely, ultra-rapid (IKur_{Kur} ), rapid (IKr_{Kr} ), outward transient (Ito_{to} ), inward rectifier K+^+ (IK1_{K1} ), L-type Ca2+^{2+} (ICaL_{CaL} ), Na+^+ /K+^+ pump (INaK_{NaK} ) and fast Na+^+ (INa_{Na}) currents. These ionic densities were identified through the P- (i.e., INa_{Na}), Ta - (i.e., IK1_{K1} , INaK_{NaK}) or both waves (i.e., IKur_{Kur} , IKr_{Kr} , Ito_{to} , ICaL_{CaL}), providing a non- invasive characterization of the atrial electrophysiology. This could improve patient stratification and cardiac safety and the efficacy of AF pharmacological treatment

    Sensing and Signal Processing in Smart Healthcare

    Get PDF
    In the last decade, we have witnessed the rapid development of electronic technologies that are transforming our daily lives. Such technologies are often integrated with various sensors that facilitate the collection of human motion and physiological data and are equipped with wireless communication modules such as Bluetooth, radio frequency identification, and near-field communication. In smart healthcare applications, designing ergonomic and intuitive human–computer interfaces is crucial because a system that is not easy to use will create a huge obstacle to adoption and may significantly reduce the efficacy of the solution. Signal and data processing is another important consideration in smart healthcare applications because it must ensure high accuracy with a high level of confidence in order for the applications to be useful for clinicians in making diagnosis and treatment decisions. This Special Issue is a collection of 10 articles selected from a total of 26 contributions. These contributions span the areas of signal processing and smart healthcare systems mostly contributed by authors from Europe, including Italy, Spain, France, Portugal, Romania, Sweden, and Netherlands. Authors from China, Korea, Taiwan, Indonesia, and Ecuador are also included

    Design and Implementation of a Stepped Frequency Continuous Wave Radar System for Biomedical Applications

    Get PDF
    There is a need to detect vital signs of human (e.g., the respiration and heart-beat rate) with noncontact method in a number of applications such as search and rescue operation (e.g. earthquakes, fire), health monitoring of the elderly, performance monitoring of athletes Ultra-wideband radar system can be utilized for noncontact vital signs monitoring and tracking of various human activities of more than one subject. Therefore, a stepped-frequency continuous wave radar (SFCW) system with wideband performance is designed and implemented for Vital signs detection and fall events monitoring. The design of the SFCW radar system is firstly developed using off-the-shelf discrete components. Later, the system is implemented using surface mount components to make it portable with low cost. The measurement result is proved to be accurate for both heart rate and respiration rate detection within ±5% when compared with contact measurements. Furthermore, an electromagnetic model has been developed using a multi-layer dielectric model of the human subject to validate the experimental results. The agreement between measured and simulated results is good for distances up to 2 m and at various subjects’ orientations with respect to the radar, even in the presence of more than one subject. The compressive sensing (CS) technique is utilized to reduce the size of the acquired data to levels significantly below the Nyquist threshold. In our demonstration, we use phase information contained in the obtained complex high-resolution range profile (HRRP) to derive the motion characteristics of the human. The obtained data has been successfully utilized for non-contact walk, fall and limping detection and healthcare monitoring. The effectiveness of the proposed method is validated using measured results

    Efficient architectures of heterogeneous fpga-gpu for 3-d medical image compression

    Get PDF
    The advent of development in three-dimensional (3-D) imaging modalities have generated a massive amount of volumetric data in 3-D images such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US). Existing survey reveals the presence of a huge gap for further research in exploiting reconfigurable computing for 3-D medical image compression. This research proposes an FPGA based co-processing solution to accelerate the mentioned medical imaging system. The HWT block implemented on the sbRIO-9632 FPGA board is Spartan 3 (XC3S2000) chip prototyping board. Analysis and performance evaluation of the 3-D images were been conducted. Furthermore, a novel architecture of context-based adaptive binary arithmetic coder (CABAC) is the advanced entropy coding tool employed by main and higher profiles of H.264/AVC. This research focuses on GPU implementation of CABAC and comparative study of discrete wavelet transform (DWT) and without DWT for 3-D medical image compression systems. Implementation results on MRI and CT images, showing GPU significantly outperforming single-threaded CPU implementation. Overall, CT and MRI modalities with DWT outperform in term of compression ratio, peak signal to noise ratio (PSNR) and latency compared with images without DWT process. For heterogeneous computing, MRI images with various sizes and format, such as JPEG and DICOM was implemented. Evaluation results are shown for each memory iteration, transfer sizes from GPU to CPU consuming more bandwidth or throughput. For size 786, 486 bytes JPEG format, both directions consumed bandwidth tend to balance. Bandwidth is relative to the transfer size, the larger sizing will take more latency and throughput. Next, OpenCL implementation for concurrent task via dedicated FPGA. Finding from implementation reveals, OpenCL on batch procession mode with AOC techniques offers substantial results where the amount of logic, area, register and memory increased proportionally to the number of batch. It is because of the kernel will copy the kernel block refer to batch number. Therefore memory bank increased periodically related to kernel block. It was found through comparative study that the tree balance and unroll loop architecture provides better achievement, in term of local memory, latency and throughput

    An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

    Full text link
    Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65nm technology, consumes less than 20mW on average at 0.8V achieving an efficiency of up to 70pJ/B in encryption, 50pJ/px in convolution, or up to 25MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74pJ/op; and seizure detection with encrypted data collection from EEG within 12.7pJ/op.Comment: 15 pages, 12 figures, accepted for publication to the IEEE Transactions on Circuits and Systems - I: Regular Paper

    Enabling Detailed, Biophysics-Based Skeletal Muscle Models on HPC Systems

    Get PDF
    Realistic simulations of detailed, biophysics-based, multi-scale models often require very high resolution and, thus, large-scale compute facilities. Existing simulation environments, especially for biomedical applications, are typically designed to allow for high flexibility and generality in model development. Flexibility and model development, however, are often a limiting factor for large-scale simulations. Therefore, new models are typically tested and run on small-scale compute facilities. By using a detailed biophysics-based, chemo-electromechanical skeletal muscle model and the international open-source software library OpenCMISS as an example, we present an approach to upgrade an existing muscle simulation framework from a moderately parallel version toward a massively parallel one that scales both in terms of problem size and in terms of the number of parallel processes. For this purpose, we investigate different modeling, algorithmic and implementational aspects. We present improvements addressing both numerical and parallel scalability. In addition, our approach includes a novel visualization environment which is based on the MegaMol framework and is capable of handling large amounts of simulated data. We present the results of a number of scaling studies at the Tier-1 supercomputer HazelHen at the High Performance Computing Center Stuttgart (HLRS). We improve the overall runtime by a factor of up to 2.6 and achieve good scalability on up to 768 cores

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Concepção e realização de um framework para sistemas embarcados baseados em FPGA aplicado a um classificador Floresta de Caminhos Ótimos

    Get PDF
    Orientadores: Eurípedes Guilherme de Oliveira Nóbrega, Isabelle Fantoni-Coichot, Vincent FrémontTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Mecânica, Université de Technologie de CompiègneResumo: Muitas aplicações modernas dependem de métodos de Inteligência Artificial, tais como classificação automática. Entretanto, o alto custo computacional associado a essas técnicas limita seu uso em plataformas embarcadas com recursos restritos. Grandes quantidades de dados podem superar o poder computacional disponível em tais ambientes, o que torna o processo de projetá-los uma tarefa desafiadora. As condutas de processamento mais comuns usam muitas funções de custo computacional elevadas, o que traz a necessidade de combinar alta capacidade computacional com eficiência energética. Uma possível estratégia para superar essas limitações e prover poder computacional suficiente aliado ao baixo consumo de energia é o uso de hardware especializado como, por exemplo, FPGA. Esta classe de dispositivos é amplamente conhecida por sua boa relação desempenho/consumo, sendo uma alternativa interessante para a construção de sistemas embarcados eficazes e eficientes. Esta tese propõe um framework baseado em FPGA para a aceleração de desempenho de um algoritmo de classificação a ser implementado em um sistema embarcado. A aceleração do desempenho foi atingida usando o esquema de paralelização SIMD, aproveitando as características de paralelismo de grão fino dos FPGA. O sistema proposto foi implementado e testado em hardware FPGA real. Para a validação da arquitetura, um classificador baseado em Teoria dos Grafos, o OPF, foi avaliado em uma proposta de aplicação e posteriormente implementado na arquitetura proposta. O estudo do OPF levou à proposição de um novo algoritmo de aprendizagem para o mesmo, usando conceitos de Computação Evolutiva, visando a redução do tempo de processamento de classificação, que, combinada à implementação em hardware, oferece uma aceleração de desempenho suficiente para ser aplicada em uma variedade de sistemas embarcadosAbstract: Many modern applications rely on Artificial Intelligence methods such as automatic classification. However, the computational cost associated with these techniques limit their use in resource constrained embedded platforms. A high amount of data may overcome the computational power available in such embedded environments while turning the process of designing them a challenging task. Common processing pipelines use many high computational cost functions, which brings the necessity of combining high computational capacity with energy efficiency. One of the strategies to overcome this limitation and provide sufficient computational power allied with low energy consumption is the use of specialized hardware such as FPGA. This class of devices is widely known for their performance to consumption ratio, being an interesting alternative to building capable embedded systems. This thesis proposes an FPGA-based framework for performance acceleration of a classification algorithm to be implemented in an embedded system. Acceleration is achieved using SIMD-based parallelization scheme, taking advantage of FPGA characteristics of fine-grain parallelism. The proposed system is implemented and tested in actual FPGA hardware. For the architecture validation, a graph-based classifier, the OPF, is evaluated in an application proposition and afterward applied to the proposed architecture. The OPF study led to a proposition of a new learning algorithm using evolutionary computation concepts, aiming at classification processing time reduction, which combined to the hardware implementation offers sufficient performance acceleration to be applied in a variety of embedded systemsDoutoradoMecanica dos Sólidos e Projeto MecanicoDoutor em Engenharia Mecânica3077/2013-09CAPE
    corecore