50 research outputs found

    A neural network face detector design using bit-width reduced FPU in FPGA

    Get PDF
    This thesis implemented a field programmable gate array (FPGA)-based face detector using a neural network (NN), as well as a bit-width reduced floating-point unit (FPU). An NN was used to easily separate face data and non-face data in the face detector. The NN performs time consuming repetitive calculation. This time consuming problem was solved by a Field Programmable Gate Array (FPGA) device and a bit-width reduced FPU in this thesis. A floating-point bit-width reduction provided a significant saving of hardware resources, such as area and power.The analytical error model, using the maximum relative representation error (MRRE) and the average relative representation error (ARRE), was developed to obtain the maximum and average output errors for the bit-width reduced FPUs. After the development of the analytical error model, the bit-width reduced FPUs and an NN were designed using MATLAB and VHDL. Finally, the analytical (MATLAB) results, along with the experimental (VHDL) results, were compared. The analytical results and the experimental results showed conformity of shape. It was also found that while maintaining 94.1% detection accuracy, a reduction in bit-width from 32 bits to 16 bits reduced the size of memory and arithmetic units by 50%, and the total power consumption by 14.7%

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Belle II Technical Design Report

    Full text link
    The Belle detector at the KEKB electron-positron collider has collected almost 1 billion Y(4S) events in its decade of operation. Super-KEKB, an upgrade of KEKB is under construction, to increase the luminosity by two orders of magnitude during a three-year shutdown, with an ultimate goal of 8E35 /cm^2 /s luminosity. To exploit the increased luminosity, an upgrade of the Belle detector has been proposed. A new international collaboration Belle-II, is being formed. The Technical Design Report presents physics motivation, basic methods of the accelerator upgrade, as well as key improvements of the detector.Comment: Edited by: Z. Dole\v{z}al and S. Un

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    Neuro-critical multimodal Edge-AI monitoring algorithm and IoT system design and development

    Get PDF
    In recent years, with the continuous development of neurocritical medicine, the success rate of treatment of patients with traumatic brain injury (TBI) has continued to increase, and the prognosis has also improved. TBI patients' condition is usually very complicated, and after treatment, patients often need a more extended time to recover. The degree of recovery is also related to prognosis. However, as a young discipline, neurocritical medicine still has many shortcomings. Especially in most hospitals, the condition of Neuro-intensive Care Unit (NICU) is uneven, the equipment has limited functionality, and there is no unified data specification. Most of the instruments are cumbersome and expensive, and patients often need to pay high medical expenses. Recent years have seen a rapid development of big data and artificial intelligence (AI) technology, which are advancing the medical IoT field. However, further development and a wider range of applications of these technologies are needed to achieve widespread adoption. Based on the above premises, the main contributions of this thesis are the following. First, the design and development of a multi-modal brain monitoring system including 8-channel electroencephalography (EEG) signals, dual-channel NIRS signals, and intracranial pressure (ICP) signals acquisition. Furthermore, an integrated display platform for multi-modal physiological data to display and analysis signals in real-time was designed. This thesis also introduces the use of the Qt signal and slot event processing mechanism and multi-threaded to improve the real-time performance of data processing to a higher level. In addition, multi-modal electrophysiological data storage and processing was realized on cloud server. The system also includes a custom built Django cloud server which realizes real-time transmission between server and WeChat applet. Based on WebSocket protocol, the data transmission delay is less than 10ms. The analysis platform can be equipped with deep learning models to realize the monitoring of patients with epileptic seizures and assess the level of consciousness of Disorders of Consciousness (DOC) patients. This thesis combines the standard open-source data set CHB-MIT, a clinical data set provided by Huashan Hospital, and additional data collected by the system described in this thesis. These data sets are merged to build a deep learning network model and develop related applications for automatic disease diagnosis for smart medical IoT systems. It mainly includes the use of the clinical data to analyze the characteristics of the EEG signal of DOC patients and building a CNN model to evaluate the patient's level of consciousness automatically. Also, epilepsy is a common disease in neuro-intensive care. In this regard, this thesis also analyzes the differences of various deep learning model between the CHB-MIT data set and clinical data set for epilepsy monitoring, in order to select the most appropriate model for the system being designed and developed. Finally, this thesis also verifies the AI-assisted analysis model.. The results show that the accuracy of the CNN network model based on the evaluation of consciousness disorder on the clinical data set reaches 82%. The CNN+STFT network model based on epilepsy monitoring reaches 90% of the accuracy rate in clinical data. Also, the multi-modal brain monitoring system built is fully verified. The EEG signal collected by this system has a high signal-to-noise ratio, strong anti-interference ability, and is very stable. The built brain monitoring system performs well in real-time and stability. Keywords: TBI, Neurocritical care, Multi-modal, Consciousness Assessment, seizures detection, deep learning, CNN, IoT

    Towards Intelligent Data Acquisition Systems with Embedded Deep Learning on MPSoC

    Get PDF
    Large-scale scientific experiments rely on dedicated high-performance data-acquisition systems to sample, readout, analyse, and store experimental data. However, with the rapid development in detector technology in various fields, the number of channels and the data rate are increasing. For trigger and control tasks data acquisition systems needs to satisfy real-time constraints, enable short-time latency and provide the possibility to integrate intelligent data processing. During recent years machine learning approaches have been used successfully in many applications. This dissertation will study how machine learning techniques can be integrated already in the data acquisition of large-scale experiments. A universal data acquisition platform for multiple data channels has been developed. Different machine learning implementation methods and application have been realized using this system. On the hardware side, recent FPGAs do not only provide high-performance parallel logic but more and more additional features, like ultra-fast transceivers and embedded ARM processors. TSMC\u27s 16nm FinFET Plus (16FF+) 3D transistor technology enables Xilinx in the Zynq UltraScale+ FPGA devices to increase the performance/watt ratio by 2 to 5 times compared to their previous generation. The selected main processor ZU11EG owns 32 GTH transceivers where each one could operate up to 16.316.3 Gb/s and 16 GTY transceivers where each of them could operate up to 32.7532.75 Gb/s. These transceivers are routed to x16 lanes Gen 33/44 PCIe, 1212 lanes full-duplex FireFly electrical/optical data link and VITA 57.4 FMC+ connector. The new Zynq UltraScale+ device provides at least three major advantages for advanced data acquisition systems: First, the 16nm FinFET+ programmable logic (PL) provides high-speed readout capabilities by high-speed transceivers; second, built-in quad-core 64-bit ARM Cortex-A53 processor enable host embedded Linux system. Thus, webservers, slow control and monitoring application could be realized in a embedded processor environment; third, the Zynq Multiprocessor System-on-Chip technology connects programmable logic and microprocessors. In this thesis, the benefits of such architectures for the integration of machine learning algorithms in data acquisition systems and control application are demonstrated. On the algorithm side, there have been many achievements in the field of machine learning over the last decades. Existing machine learning algorithms split into several categories depending on how the learning phase is organized: Supervised Learning, Unsupervised Learning, Semi-Supervised Learning and Reinforcement Learning. Most commonly used in scientific applications are supervised learning and reinforcement learning. Supervised learning learns from the labelled input and output, and generates a function that could predict the future different input to the appropriate output. A common application instance is a classification. They have a wide difference in basic math theory, training, inference, and their implementation. One of the natural solutions is Application Specific Integrated Circuit (ASIC) Artificial Intelligence (AI) chips. A typical example is the Google Tensor Processing Unit (TPU), it could cover the training and inference for both supervised learning and reinforcement learning. One of the major issues is that such chip could not provide high data transferring bandwidth other than high compute power. As a comparison, the Xilinx UltraScale+ FPGA could also provide raw compute power and efficiency for all different data types down to a single bit. From a deployment point of view, the training part of supervised learning is typically performed by CPU/GPU/TPU on a fixed dataset. For reinforcement learning, the training phase is more complex. The algorithm needs to periodically interact with the controlled system and execute a Markov Decision Process (MDP). There is no static training dataset, but it is obtained in real-time. The time slot between each step depends on the dynamics of the controlled system. The inference is also bound to this sampling time because the algorithm needs to interact with the environment and decide the appropriate action for a response, then a higher demand on time is proposed. This thesis gives solutions for both training and inference of reinforcement learning. At first, the requirements are analyzed, then the algorithm is deduced from scratch, and training on the PS part of Zynq device is implemented, meanwhile the inference at FPGA side is proposed which is similar solution compared with supervised learning. The results for Policy Gradient show a lot of improvement over a CPU/GPU-based machine learning framework. The Deep Deterministic Policy Gradient also has improvement regarding both training latency and stability. This implementation method provides a low-latency approach for reinforcement learning on-field training process

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of ÎĽs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    Advanced Knowledge Application in Practice

    Get PDF
    The integration and interdependency of the world economy leads towards the creation of a global market that offers more opportunities, but is also more complex and competitive than ever before. Therefore widespread research activity is necessary if one is to remain successful on the market. This book is the result of research and development activities from a number of researchers worldwide, covering concrete fields of research
    corecore