873 research outputs found

    Efficient FPGA Implementation of PCA Algorithm for Large Data using High Level Synthesis

    Get PDF
    Principal Component Analysis (PCA) is a widely used method for dimensionality reduction in different application areas, including microwave imaging where the size of input data is large. Despite its popularity, one of the difficulties in using PCA is its high computational complexity, especially for large dimensional data. In recent years several FPGA implementations have been proposed to accelerate PCA computation. However, most of them use manual RTL design, which requires more time for design and development. In this paper, we propose an FPGA implementation of PCA using High Level Synthesis (HLS), which allows us to explore the design space more efficiently than with hand-coded RTL design. Starting from a PCA algorithm written in C++, we apply various hardware optimization techniques to the same code using Vivado HLS in order to quickly explore the design space. Our experiments show that the performance of the design obtained with the proposed method is superior to the state-of-the-art RTL design in terms of resource utilization, latency and frequency

    FPGA Acceleration of Domain-specific Kernels via High-Level Synthesis

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    A committee machine gas identification system based on dynamically reconfigurable FPGA

    Get PDF
    This paper proposes a gas identification system based on the committee machine (CM) classifier, which combines various gas identification algorithms, to obtain a unified decision with improved accuracy. The CM combines five different classifiers: K nearest neighbors (KNNs), multilayer perceptron (MLP), radial basis function (RBF), Gaussian mixture model (GMM), and probabilistic principal component analysis (PPCA). Experiments on real sensors' data proved the effectiveness of our system with an improved accuracy over individual classifiers. Due to the computationally intensive nature of CM, its implementation requires significant hardware resources. In order to overcome this problem, we propose a novel time multiplexing hardware implementation using a dynamically reconfigurable field programmable gate array (FPGA) platform. The processing is divided into three stages: sampling and preprocessing, pattern recognition, and decision stage. Dynamically reconfigurable FPGA technique is used to implement the system in a sequential manner, thus using limited hardware resources of the FPGA chip. The system is successfully tested for combustible gas identification application using our in-house tin-oxide gas sensors

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Design and implementation a prototype system for fusion image by using SWT-PCA algorithm with FPGA technique

    Get PDF
    The technology of fusion image is dominance strongly over domain research for recent years, the techniques of fusion have various applications in real time used and proposed such as purpose of military and remote sensing etc.,the fusion image is very efficient in processing of digital image. Single image produced from two images or more information of relevant combining process results from multi sensor fusion image. FPGA is the best implementation types of most technology enabling wide spread.This device works with modern versions for different critical characteristics same huge number of elements logic in order to permit complex algorithm implemented. In this paper,filters are designed and implemented in FPGA utilized for disease specified detection from images CT/MRI scanned where the samples are taken for human's brain with various medical images and the processing of fusion employed by using technique Stationary Wavelet Transform and Principal Component Analysis (SWT-PCA). Accuracy image output increases when implemented this technique and that was done by sampling down eliminating where effects blurring and artifacts doesn't influenced. The algorithm of SWT-PCA parameters quality measurements like NCC,MSE ,PSNR, coefficients and Eigen values.The advantages significant of this system that provide real time, time rapid to market and portability beside the change parametric continuing in the DWT transform. The designed and simulation of module proposed system has been done by using MATLAB simulink and blocks generator system, Xilinx synthesized with synthesis tool (XST) and implemented in XilinxSpartan 6-SP605 device

    MLP neural network based gas classification system on Zynq SoC

    Get PDF
    Systems based on Wireless Gas Sensor Networks (WGSN) offer a powerful tool to observe and analyse data in complex environments over long monitoring periods. Since the reliability of sensors is very important in those systems, gas classification is a critical process within the gas safety precautions. A gas classification system has to react fast in order to take essential actions in case of fault detection. This paper proposes a low latency real-time gas classification service system, which uses a Multi-Layer Perceptron (MLP) Artificial Neural Network (ANN) to detect and classify the gas sensor data. An accurate MLP is developed to work with the data set obtained from an array of tin oxide (SnO2) gas sensor, based on convex Micro hotplates (MHP). The overall system acquires the gas sensor data through RFID, and processes the sensor data with the proposed MLP classifier implemented on a System on Chip (SoC) platform from Xilinx. Hardware implementation of the classifier is optimized to achieve very low latency for real-time application. The proposed architecture has been implemented on a ZYNQ SoC using fixed-point format and achieved results have shown that an accuracy of 97.4% has been obtained

    Low-power dynamic object detection and classification with freely moving event cameras

    Get PDF
    We present the first purely event-based, energy-efficient approach for dynamic object detection and categorization with a freely moving event camera. Compared to traditional cameras, event-based object recognition systems are considerably behind in terms of accuracy and algorithmic maturity. To this end, this paper presents an event-based feature extraction method devised by accumulating local activity across the image frame and then applying principal component analysis (PCA) to the normalized neighborhood region. Subsequently, we propose a backtracking-free k-d tree mechanism for efficient feature matching by taking advantage of the low-dimensionality of the feature representation. Additionally, the proposed k-d tree mechanism allows for feature selection to obtain a lower-dimensional object representation when hardware resources are limited to implement PCA. Consequently, the proposed system can be realized on a field-programmable gate array (FPGA) device leading to high performance over resource ratio. The proposed system is tested on real-world event-based datasets for object categorization, showing superior classification performance compared to state-of-the-art algorithms. Additionally, we verified the real-time FPGA performance of the proposed object detection method, trained with limited data as opposed to deep learning methods, under a closed-loop aerial vehicle flight mode. We also compare the proposed object categorization framework to pre-trained convolutional neural networks using transfer learning and highlight the drawbacks of using frame-based sensors under dynamic camera motion. Finally, we provide critical insights about the feature extraction method and the classification parameters on the system performance, which aids in understanding the framework to suit various low-power (less than a few watts) application scenarios

    Dimensionality reduction using parallel ICA and its implementation on FPGA in hyperspectral image analysis

    Get PDF
    Hyperspectral images, although providing abundant information of the object, also bring high computational burden to data processing. This thesis studies the challenging problem of dimensionality reduction in Hyperspectral Image (HSI) analysis. Currently, there are two methods to reduce the dimension: band selection and feature extraction. This thesis presents a band selection technique based on Independent Component Analysis (ICA), an unsupervised signal separation algorithm. Given only the observations of hyperspectral images, the ICA –based band selection picks the independent bands which contain most of the spectral information of the original images. Due to the high volume of hyperspectral images, ICA -based band selection is a time consuming process. This thesis develops a parallel ICA algorithm which divides the decorrelation process into internal decorrelation and external decorrelation such that computation burden can be distributed from single processor to multiple processors, and the ICA process can be run in a parallel mode. Hardware implementation is always a faster and real -time solution to HSI analysis. Until now, there are few hardware designs for ICA -related processes. This thesis synthesizes the parallel ICA -based band selection on Field Programmable Gate Array (FPGA), which is the best choice for moderate designs and fast implementations. Compared to other design syntheses, the synthesis present in this thesis develops three ICA re-configurable components for the purpose of reusability. In addition, this thesis demonstrates the relationship between the design and the capacity utilization of a single FPGA, then discusses the features of High Performance Reconfigurable Computing (HPRC) to accomodate large capacity and design requirements. Experiments are conducted on three data sets obtained from different sources. Experimental results show the effectiveness of the proposed ICA -based band selection, parallel ICA and its synthesis on FPGA
    corecore