453 research outputs found

    A scalable dataflow accelerator for real time onboard hyperspectral image classification

    No full text
    © Springer International Publishing Switzerland 2016.Real-time hyperspectral image classification is a necessary primitive in many remotely sensed image analysis applications. Previous work has shown that Support Vector Machines (SVMs) can achieve high classification accuracy, but unfortunately it is very computationally expensive. This paper presents a scalable dataflow accelerator on FPGA for real-time SVM classification of hyperspectral images.To address data dependencies, we adapt multi-class classifier based on Hamming distance. The architecture is scalable to high problem dimensionality and available hardware resources. Implementation results show that the FPGA design achieves speedups of 26x, 1335x, 66x and 14x compared with implementations on ZYNQ, ARM, DSP and Xeon processors. Moreover, one to two orders of magnitude reduction in power consumption is achieved for the AVRIS hyperspectral image datasets

    An Efficient Classification of Hyperspectral Remotely Sensed Data Using Support Vector Machine

    Get PDF
    This work present an efficient hardware architecture of Support Vector Machine (SVM) for the classification of Hyperspectral remotely sensed data using High Level Synthesis (HLS) method. The high classification time and power consumption in traditional classification of remotely sensed data is the main motivation for this work. Therefore presented work helps to classify the remotely sensed data in real-time and to take immediate action during the natural disaster. An embedded based SVM is designed and implemented on Zynq SoC for classification of hyperspectral images. The data set of remotely sensed data are tested on different platforms and the performance is compared with existing works. Novelty in our proposed work is extend the HLS based FPGA implantation to the onboard classification system in remote sensing. The experimental results for selected data set from different class shows that our architecture on Zynq 7000 implementation generates a delay of 11.26 µs and power consumption of 1.7 Watts, which is extremely better as compared to other Field Programmable Gate Array (FPGA) implementation using Hardware description Language (HDL)  and Central Processing Unit (CPU) implementation

    Towards efficient on-board deployment of DNNs on intelligent autonomous systems

    Get PDF
    With their unprecedented performance in major AI tasks, deep neural networks (DNNs) have emerged as a primary building block in modern autonomous systems. Intelligent systems such as drones, mobile robots and driverless cars largely base their perception, planning and application-specific tasks on DNN models. Nevertheless, due to the nature of these applications, such systems require on-board local processing in order to retain their autonomy and meet latency and throughput constraints. In this respect, the large computational and memory demands of DNN workloads pose a significant barrier on their deployment on the resource-and power-constrained compute platforms that are available on-board. This paper presents an overview of recent methods and hardware architectures that address the system-level challenges of modern DNN-enabled autonomous systems at both the algorithmic and hardware design level. Spanning from latency-driven approximate computing techniques to high-throughput mixed-precision cascaded classifiers, the presented set of works paves the way for the on-board deployment of sophisticated DNN models on robots and autonomous systems

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%
    corecore