69 research outputs found

    Embedded Parallel Computing Platform for Real-Time Recognition of Power Quality Disturbance Based on Deep Network

    Get PDF
    Systems powered by scattered sustainable power sources are highly susceptible to disturbances in the quality of power. Power Quality Disturbances (PQD) signals can degrade the functionality of grid-powered appliances. The older techniques for recognizing thePQD signals involve feature extraction. Manual analysis needs to set up a digital signal processor platform, which may lead to a time-complex process and errors in accuracy. Real-time PQD (RPQD) recognition techniques have advanced Embedded Parallel Computing Platform(EPCP), various signal processing methods, artificial intelligence, and Deep Network (DN) methodologies to recognize RPQD signals successfully in real-time scenarios using EPCP-RPQD-DN. Initially, the proposed algorithm implements hybridized Deep Belief Network and Long Short-Term Memory (DBN-LSTM) to accurately recognize the real-time PQD signals. Secondly, the DBN module maximizes the input signal features for the generation of PQD in a fixed period by training phase directly from raw PQD input signals and forwards it to the LSTM module. Third, in LSTM, the time series nature of PQD signals is easily analyzed using three layers, allowing it to run on the EPCP model. The PQD sample signals are employed to train the DBN in a central monitoring server. A series of PQD signals generated by the EPCP simulation environment is carried out to validate the effectiveness of the EPCP-RPQD-DN approach. Real-time simulation of electromagnetic fault conditions in the power system by Real Time Digital Simulator (RTDS) hardware. Experimental evaluation shows that DN learning improves accuracy rate, reduces computational overhead, and minimizes error rate compared to existing approaches

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    FPGA Accelerator for Meta-Recognition Anomaly Detection: Case of Burned Area Detection

    Get PDF
    Optical remote sensing instruments accumulate abundant data from across all of the earth's land surfaces, making it possible both to understand the effects of climate change and to monitor, investigate, and manage ground-level events in detail. Processing data using resources located near on-board satellite sensors can bring major benefits in terms of minimizing analysis time and quickly initiating active actions in critical situations. In satellite missions, long-term production on-board algorithms may encounter unexplored samples, i.e., abnormal ground-level events, and need to be able to discriminate and take the correct action. In this matter, the authors present a field programmable gate array (FPGA)-based solution for natural anomaly detection in multispectral imagery using deep convolutional neural networks. The effects of weather-induced hazards and natural disasters, considered anomalies in this sense, are discovered by modeling an anomaly detector on a hybrid system that is hardware efficient. The proposed approach is assembled on a Xilinx Zynq UltraScale+ XCZU9EG multiprocessor system-on-chip (MPSoC) device, where a deep convolutional model is scaled into the FPGA logic, followed by a downstream statistical meta-recognition predictor. The proposed anomaly detection accelerator has produced notable results in identifying a contemporary natural hazard, i.e., burned areas, in scenes acquired by Sentinel-2 over Europe, i.e., Spain and France. The implemented algorithm achieved on the FPGA accelerator an equivalent speedup of 4.46× and 4.5× lower power consumption than the equivalent implementation on the Tesla K80 GPU

    Energy-efficient embedded machine learning algorithms for smart sensing systems

    Get PDF
    Embedded autonomous electronic systems are required in numerous application domains such as Internet of Things (IoT), wearable devices, and biomedical systems. Embedded electronic systems usually host sensors, and each sensor hosts multiple input channels (e.g., tactile, vision), tightly coupled to the electronic computing unit (ECU). The ECU extracts information by often employing sophisticated methods, e.g., Machine Learning. However, embedding Machine Learning algorithms poses essential challenges in terms of hardware resources and energy consumption because of: 1) the high amount of data to be processed; 2) computationally demanding methods. Leveraging on the trade-off between quality requirements versus computational complexity and time latency could reduce the system complexity without affecting the performance. The objectives of the thesis are to develop: 1) energy-efficient arithmetic circuits outperforming state of the art solutions for embedded machine learning algorithms, 2) an energy-efficient embedded electronic system for the \u201celectronic-skin\u201d (e-skin) application. As such, this thesis exploits two main approaches: Approximate Computing: In recent years, the approximate computing paradigm became a significant major field of research since it is able to enhance the energy efficiency and performance of digital systems. \u201cApproximate Computing\u201d(AC) turned out to be a practical approach to trade accuracy for better power, latency, and size . AC targets error-resilient applications and offers promising benefits by conserving some resources. Usually, approximate results are acceptable for many applications, e.g., tactile data processing,image processing , and data mining ; thus, it is highly recommended to take advantage of energy reduction with minimal variation in performance . In our work, we developed two approximate multipliers: 1) the first one is called \u201cMETA\u201d multiplier and is based on the Error Tolerant Adder (ETA), 2) the second one is called \u201cApproximate Baugh-Wooley(BW)\u201d multiplier where the approximations are implemented in the generation of the partial products. We showed that the proposed approximate arithmetic circuits could achieve a relevant reduction in power consumption and time delay around 80.4% and 24%, respectively, with respect to the exact BW multiplier. Next, to prove the feasibility of AC in real world applications, we explored the approximate multipliers on a case study as the e-skin application. The e-skin application is defined as multiple sensing components, including 1) structural materials, 2) signal processing, 3) data acquisition, and 4) data processing. Particularly, processing the originated data from the e-skin into low or high-level information is the main problem to be addressed by the embedded electronic system. Many studies have shown that Machine Learning is a promising approach in processing tactile data when classifying input touch modalities. In our work, we proposed a methodology for evaluating the behavior of the system when introducing approximate arithmetic circuits in the main stages (i.e., signal and data processing stages) of the system. Based on the proposed methodology, we first implemented the approximate multipliers on the low-pass Finite Impulse Response (FIR) filter in the signal processing stage of the application. We noticed that the FIR filter based on (Approx-BW) outperforms state of the art solutions, while respecting the tradeoff between accuracy and power consumption, with an SNR degradation of 1.39dB. Second, we implemented approximate adders and multipliers respectively into the Coordinate Rotational Digital Computer (CORDIC) and the Singular Value Decomposition (SVD) circuits; since CORDIC and SVD take a significant part of the computationally expensive Machine Learning algorithms employed in tactile data processing. We showed benefits of up to 21% and 19% in power reduction at the cost of less than 5% accuracy loss for CORDIC and SVD circuits when scaling the number of approximated bits. 2) Parallel Computing Platforms (PCP): Exploiting parallel architectures for near-threshold computing based on multi-core clusters is a promising approach to improve the performance of smart sensing systems. In our work, we exploited a novel computing platform embedding a Parallel Ultra Low Power processor (PULP), called \u201cMr. Wolf,\u201d for the implementation of Machine Learning (ML) algorithms for touch modalities classification. First, we tested the ML algorithms at the software level; for RGB images as a case study and tactile dataset, we achieved accuracy respectively equal to 97% and 83.5%. After validating the effectiveness of the ML algorithm at the software level, we performed the on-board classification of two touch modalities, demonstrating the promising use of Mr. Wolf for smart sensing systems. Moreover, we proposed a memory management strategy for storing the needed amount of trained tensors (i.e., 50 trained tensors for each class) in the on-chip memory. We evaluated the execution cycles for Mr. Wolf using a single core, 2 cores, and 3 cores, taking advantage of the benefits of the parallelization. We presented a comparison with the popular low power ARM Cortex-M4F microcontroller employed, usually for battery-operated devices. We showed that the ML algorithm on the proposed platform runs 3.7 times faster than ARM Cortex M4F (STM32F40), consuming only 28 mW. The proposed platform achieves 15 7 better energy efficiency than the classification done on the STM32F40, consuming 81mJ per classification and 150 pJ per operation

    Efficient machine learning: models and accelerations

    Get PDF
    One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems. To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance. As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption. Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression

    Non-Contact Evaluation Methods for Infrastructure Condition Assessment

    Get PDF
    The United States infrastructure, e.g. roads and bridges, are in a critical condition. Inspection, monitoring, and maintenance of these infrastructure in the traditional manner can be expensive, dangerous, time-consuming, and tied to human judgment (the inspector). Non-contact methods can help overcoming these challenges. In this dissertation two aspects of non-contact methods are explored: inspections using unmanned aerial systems (UASs), and conditions assessment using image processing and machine learning techniques. This presents a set of investigations to determine a guideline for remote autonomous bridge inspections

    Embedded Artificial Intelligence for Tactile Sensing

    Get PDF
    Electronic tactile sensing becomes an active research field whether for prosthetic applications, robotics, virtual reality or post stroke patients rehabilitation. To achieve such sensing, an array of sensors is used to retrieve human-skin like information, which is called Electronic skin (E-skin). Humans through their skins, are able to collect different types of information e.g. pressure, temperature, texture, etc. which are then passed to the nervous system, and finally to the brain in order to extract high level information from these sensory data. In order to make E-skin capable of such task, data acquired from E-skin should be filtered, processed, and then conveyed to the user (or robot). Processing these sensory information, should occur in real-time, taking in consideration the power limitation in such applications, especially prosthetic applications. The power consumption itself is related to different factors, one factor is the complexity of the algorithm e.g. number of FLOPs, and another is the memory consumption. In this thesis, I will focus on the processing of real tactile information, by 1)exploring different algorithms and methods for tactile data classification, 2)data organization and preprocessing of such tactile data and 3)hardware implementation. More precisely the focus will be on deep learning algorithms for tactile data processing mainly CNNs and RNNs, with energy-efficient embedded implementations. The proposed solution has proved less memory, FLOPs, and latency compared to the state of art (including tensorial SVM), applied to real tactile sensors data. Keywords: E-skin, tactile data processing, deep learning, CNN, RNN, LSTM, GRU, embedded, energy-efficient algorithms, edge computing, artificial intelligence

    Low-power dynamic object detection and classification with freely moving event cameras

    Get PDF
    We present the first purely event-based, energy-efficient approach for dynamic object detection and categorization with a freely moving event camera. Compared to traditional cameras, event-based object recognition systems are considerably behind in terms of accuracy and algorithmic maturity. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional object representation when hardware resources are limited to implement PCA. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance compared to state-of-the-art algorithms. Additionally, we verified the real-time FPGA performance of the proposed object detection method, trained with limited data as opposed to deep learning methods, under a closed-loop aerial vehicle flight mode. We also compare the proposed object categorization framework to pre-trained convolutional neural networks using transfer learning and highlight the drawbacks of using frame-based sensors under dynamic camera motion. Finally, we provide critical insights about the feature extraction method and the classification parameters on the system performance, which aids in understanding the framework to suit various low-power (less than a few watts) application scenarios