42 research outputs found

    Proposal of the CAD System for Melanoma Detection Using Reconfigurable Computing

    Get PDF
    This work proposes dedicated hardware to real-time cancer detection using Field-Programmable Gate Arrays (FPGA). The presented hardware combines a Multilayer Perceptron (MLP) Artificial Neural Networks (ANN) with Digital Image Processing (DIP) techniques. The DIP techniques are used to extract the features from the analyzed skin, and the MLP classifies the lesion into melanoma or non-melanoma. The classification results are validated with an open-access database. Finally, analysis regarding execution time, hardware resources usage, and power consumption are performed. The results obtained through this analysis are then compared to an equivalent software implementation embedded in an ARM A9 microprocessor

    HLS-based dataflow hardware architecture for Support Vector Machine in FPGA

    Get PDF
    Implementing fast and accurate Support Vector Machine (SVM) classifiers in embedded systems with limited compute and memory capacity and in applications with real-time constraints, such as continuous medical monitoring for anomaly detection, can be challenging and calls for low cost, low power and resource efficient hardware accelerators. In this paper, we propose a flexible FPGA-based SVM accelerator highly optimized through a dataflow architecture. Thanks to High Level Synthesis (HLS) and the dataflow method, our design is scalable and can be used for large data dimensions when there is limited on-chip memory. The hardware parallelism is adjustable and can be specified according to the available FPGA resources. The performance of different SVM kernels are evaluated in hardware. In addition, an efficient fixed-point implementation is proposed to improve the speed. We compared our design with recent SVM accelerators and achieved a minimum of 10x speed-up compared to other HLS-based and 4.4x compared to HDL-based designs

    FPGA Acceleration of Domain-specific Kernels via High-Level Synthesis

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Smart embedded system for skin cancer classification

    Get PDF
    The very good results achieved with recent algorithms for image classification based on deep learning have enabled new applications in many domains. The medical field is one that can greatly benefit from these algorithms in order to help the medical professional elaborate on his/her diagnostic. In particular, portable devices for medical image classification are useful in scenarios where a full analysis system is not an option or is difficult to obtain. Algorithms based on deep learning models are computationally demanding; therefore, it is difficult to run them in low-cost devices with a low energy consumption and high efficiency. In this paper, a low-cost system is proposed to classify skin cancer images. Two approaches were followed to achieve a fast and accurate system. At the algorithmic level, a cascade inference technique was considered, where two models were used for inference. At the architectural level, the deep learning processing unit from Vitis-AI was considered in order to design very efficient accelerators in FPGA. The dual model was trained and implemented for skin cancer detection in a ZYNQ UltraScale+ MPSoC ZCU104 evaluation kit with a ZU7EV device. The core was integrated in a full system-on-chip solution and tested with the HAM10000 dataset. It achieves a performance of 13.5 FPS with an accuracy of 87%, with only 33k LUTs, 80 DSPs, 70 BRAMs and 1 URAM.info:eu-repo/semantics/publishedVersio

    Recent Advances in Embedded Computing, Intelligence and Applications

    Get PDF
    The latest proliferation of Internet of Things deployments and edge computing combined with artificial intelligence has led to new exciting application scenarios, where embedded digital devices are essential enablers. Moreover, new powerful and efficient devices are appearing to cope with workloads formerly reserved for the cloud, such as deep learning. These devices allow processing close to where data are generated, avoiding bottlenecks due to communication limitations. The efficient integration of hardware, software and artificial intelligence capabilities deployed in real sensing contexts empowers the edge intelligence paradigm, which will ultimately contribute to the fostering of the offloading processing functionalities to the edge. In this Special Issue, researchers have contributed nine peer-reviewed papers covering a wide range of topics in the area of edge intelligence. Among them are hardware-accelerated implementations of deep neural networks, IoT platforms for extreme edge computing, neuro-evolvable and neuromorphic machine learning, and embedded recommender systems

    An Ultra-low-power Real-time Machine Learning based fNIRS Motion Artefacts Detection

    Get PDF

    Energy-efficient embedded machine learning algorithms for smart sensing systems

    Get PDF
    Embedded autonomous electronic systems are required in numerous application domains such as Internet of Things (IoT), wearable devices, and biomedical systems. Embedded electronic systems usually host sensors, and each sensor hosts multiple input channels (e.g., tactile, vision), tightly coupled to the electronic computing unit (ECU). The ECU extracts information by often employing sophisticated methods, e.g., Machine Learning. However, embedding Machine Learning algorithms poses essential challenges in terms of hardware resources and energy consumption because of: 1) the high amount of data to be processed; 2) computationally demanding methods. Leveraging on the trade-off between quality requirements versus computational complexity and time latency could reduce the system complexity without affecting the performance. The objectives of the thesis are to develop: 1) energy-efficient arithmetic circuits outperforming state of the art solutions for embedded machine learning algorithms, 2) an energy-efficient embedded electronic system for the \u201celectronic-skin\u201d (e-skin) application. As such, this thesis exploits two main approaches: Approximate Computing: In recent years, the approximate computing paradigm became a significant major field of research since it is able to enhance the energy efficiency and performance of digital systems. \u201cApproximate Computing\u201d(AC) turned out to be a practical approach to trade accuracy for better power, latency, and size . AC targets error-resilient applications and offers promising benefits by conserving some resources. Usually, approximate results are acceptable for many applications, e.g., tactile data processing,image processing , and data mining ; thus, it is highly recommended to take advantage of energy reduction with minimal variation in performance . In our work, we developed two approximate multipliers: 1) the first one is called \u201cMETA\u201d multiplier and is based on the Error Tolerant Adder (ETA), 2) the second one is called \u201cApproximate Baugh-Wooley(BW)\u201d multiplier where the approximations are implemented in the generation of the partial products. We showed that the proposed approximate arithmetic circuits could achieve a relevant reduction in power consumption and time delay around 80.4% and 24%, respectively, with respect to the exact BW multiplier. Next, to prove the feasibility of AC in real world applications, we explored the approximate multipliers on a case study as the e-skin application. The e-skin application is defined as multiple sensing components, including 1) structural materials, 2) signal processing, 3) data acquisition, and 4) data processing. Particularly, processing the originated data from the e-skin into low or high-level information is the main problem to be addressed by the embedded electronic system. Many studies have shown that Machine Learning is a promising approach in processing tactile data when classifying input touch modalities. In our work, we proposed a methodology for evaluating the behavior of the system when introducing approximate arithmetic circuits in the main stages (i.e., signal and data processing stages) of the system. Based on the proposed methodology, we first implemented the approximate multipliers on the low-pass Finite Impulse Response (FIR) filter in the signal processing stage of the application. We noticed that the FIR filter based on (Approx-BW) outperforms state of the art solutions, while respecting the tradeoff between accuracy and power consumption, with an SNR degradation of 1.39dB. Second, we implemented approximate adders and multipliers respectively into the Coordinate Rotational Digital Computer (CORDIC) and the Singular Value Decomposition (SVD) circuits; since CORDIC and SVD take a significant part of the computationally expensive Machine Learning algorithms employed in tactile data processing. We showed benefits of up to 21% and 19% in power reduction at the cost of less than 5% accuracy loss for CORDIC and SVD circuits when scaling the number of approximated bits. 2) Parallel Computing Platforms (PCP): Exploiting parallel architectures for near-threshold computing based on multi-core clusters is a promising approach to improve the performance of smart sensing systems. In our work, we exploited a novel computing platform embedding a Parallel Ultra Low Power processor (PULP), called \u201cMr. Wolf,\u201d for the implementation of Machine Learning (ML) algorithms for touch modalities classification. First, we tested the ML algorithms at the software level; for RGB images as a case study and tactile dataset, we achieved accuracy respectively equal to 97% and 83.5%. After validating the effectiveness of the ML algorithm at the software level, we performed the on-board classification of two touch modalities, demonstrating the promising use of Mr. Wolf for smart sensing systems. Moreover, we proposed a memory management strategy for storing the needed amount of trained tensors (i.e., 50 trained tensors for each class) in the on-chip memory. We evaluated the execution cycles for Mr. Wolf using a single core, 2 cores, and 3 cores, taking advantage of the benefits of the parallelization. We presented a comparison with the popular low power ARM Cortex-M4F microcontroller employed, usually for battery-operated devices. We showed that the ML algorithm on the proposed platform runs 3.7 times faster than ARM Cortex M4F (STM32F40), consuming only 28 mW. The proposed platform achieves 15 7 better energy efficiency than the classification done on the STM32F40, consuming 81mJ per classification and 150 pJ per operation

    Majorization-Minimization for sparse SVMs

    Full text link
    Several decades ago, Support Vector Machines (SVMs) were introduced for performing binary classification tasks, under a supervised framework. Nowadays, they often outperform other supervised methods and remain one of the most popular approaches in the machine learning arena. In this work, we investigate the training of SVMs through a smooth sparse-promoting-regularized squared hinge loss minimization. This choice paves the way to the application of quick training methods built on majorization-minimization approaches, benefiting from the Lipschitz differentiabililty of the loss function. Moreover, the proposed approach allows us to handle sparsity-preserving regularizers promoting the selection of the most significant features, so enhancing the performance. Numerical tests and comparisons conducted on three different datasets demonstrate the good performance of the proposed methodology in terms of qualitative metrics (accuracy, precision, recall, and F 1 score) as well as computational cost

    Massively-parallel and concurrent SVM architectures

    Get PDF
    This work presents several Support Vector Machine (SVM) architectures developed by the Author with the intent of exploiting the inherent parallel structures and potential- concurrency underpinning the SVM’s mathematical operation. Two SVM training sub- system prototypes are presented - a brute-force search classification training architecture, and, Artificial Neural Network (ANN)-mapped optimisation architectures for both SVM classification training and SVM regression training. This work also proposes and proto- types a set of parallelised SVM Digital Signal Processor (DSP) pipeline architectures. The parallelised SVM DSP pipeline architectures have been modelled in C and implemented in VHDL for the synthesis and fitting on an Altera Stratix V FPGA. Each system pre- sented in this work has been applied to a problem domain application appropriate to the SVM system’s architectural limitations - including the novel application of the SVM as a chaotic and non-linear system parameter-identification tool. The SVM brute-force search classification training architecture has been modelled for datasets of 2 dimensions and composed of linear and non-linear problems requiring only 4 support vectors by utilising the linear kernel and the polynomial kernel respectively. The system has been implemented in Matlab and non-exhaustively verified using the holdout method with a trivial linearly separable classification problem dataset and a trivial non- linear XOR classification problem dataset. While the architecture was a feasible design for software-based implementations targeting 2-dimensional datasets the architectural com- plexity and unmanageable number of parallelisable operations introduced by increasing data-dimensionality and the number of support vectors subsequently resulted in the Au- thor pursuing different parallelised-architecture strategies. Two distinct ANN-mapped optimisation strategies developed and proposed for SVM classification training and SVM regression training have been modelled in Matlab; the architectures have been designed such that any dimensionality dataset can be applied by configuring the appropriate dimensionality and support vector parameters. Through Monte-Carlo testing using the datasets examined in this work the gain parameters in- herent in the architectural design of the systems were found to be difficult to tune, and, system convergence to acceptable sets of training support vectors were unachieved. The ANN-mapped optimisation strategies were thus deemed inappropriate for SVM training with the applied datasets without more design effort and architectural modification work. The parallelised SVM DSP pipeline architecture prototypes data-set dimensionality, sup- port vector set counts, and latency ranges follow. In each case the Field Programmable Gate Array (FPGA) pipeline prototype latency unsurprisingly outclassed the correspond- ing C-software model execution times by at least 3 orders of magnitude. The SVM classi- fication training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.18 microseconds to a maximum of 0.28 mi- croseconds. The SVM classification function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 32 support vectors, and have a pipeline latency range spanning from a minimum of 0.16 microseconds to a maximum of 0.24 microseconds. The SVM regression training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range span- ning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. The SVM regression function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. Finally, utilising LIBSVM training and the parallelised SVM DSP pipeline function eval- uation architecture prototypes, SVM classification and SVM regression was successfully applied to Rajkumar’s oil and gas pipeline fault detection and failure system legacy data- set yielding excellent results. Also utilising LIBSVM training, and, the parallelised SVM DSP pipeline function evaluation architecture prototypes, both SVM classification and SVM regression was applied to several chaotic systems as a feasibility study into the ap- plication of the SVM machine learning paradigm for chaotic and non-linear dynamical system parameter-identification. SVM classification was applied to the Lorenz Attrac- tor and an ANN-based chaotic oscillator to a reasonably acceptable degree of success. SVM classification was applied to the Mackey-Glass attractor yielding poor results. SVM regression was applied Lorenz Attractor and an ANN-based chaotic oscillator yielding av- erage but encouraging results. SVM regression was applied to the Mackey-Glass attractor yielding poor results

    Massively-parallel and concurrent SVM architectures

    Get PDF
    This work presents several Support Vector Machine (SVM) architectures developed by the Author with the intent of exploiting the inherent parallel structures and potential- concurrency underpinning the SVM’s mathematical operation. Two SVM training sub- system prototypes are presented - a brute-force search classification training architecture, and, Artificial Neural Network (ANN)-mapped optimisation architectures for both SVM classification training and SVM regression training. This work also proposes and proto- types a set of parallelised SVM Digital Signal Processor (DSP) pipeline architectures. The parallelised SVM DSP pipeline architectures have been modelled in C and implemented in VHDL for the synthesis and fitting on an Altera Stratix V FPGA. Each system pre- sented in this work has been applied to a problem domain application appropriate to the SVM system’s architectural limitations - including the novel application of the SVM as a chaotic and non-linear system parameter-identification tool. The SVM brute-force search classification training architecture has been modelled for datasets of 2 dimensions and composed of linear and non-linear problems requiring only 4 support vectors by utilising the linear kernel and the polynomial kernel respectively. The system has been implemented in Matlab and non-exhaustively verified using the holdout method with a trivial linearly separable classification problem dataset and a trivial non- linear XOR classification problem dataset. While the architecture was a feasible design for software-based implementations targeting 2-dimensional datasets the architectural com- plexity and unmanageable number of parallelisable operations introduced by increasing data-dimensionality and the number of support vectors subsequently resulted in the Au- thor pursuing different parallelised-architecture strategies. Two distinct ANN-mapped optimisation strategies developed and proposed for SVM classification training and SVM regression training have been modelled in Matlab; the architectures have been designed such that any dimensionality dataset can be applied by configuring the appropriate dimensionality and support vector parameters. Through Monte-Carlo testing using the datasets examined in this work the gain parameters in- herent in the architectural design of the systems were found to be difficult to tune, and, system convergence to acceptable sets of training support vectors were unachieved. The ANN-mapped optimisation strategies were thus deemed inappropriate for SVM training with the applied datasets without more design effort and architectural modification work. The parallelised SVM DSP pipeline architecture prototypes data-set dimensionality, sup- port vector set counts, and latency ranges follow. In each case the Field Programmable Gate Array (FPGA) pipeline prototype latency unsurprisingly outclassed the correspond- ing C-software model execution times by at least 3 orders of magnitude. The SVM classi- fication training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.18 microseconds to a maximum of 0.28 mi- croseconds. The SVM classification function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 32 support vectors, and have a pipeline latency range spanning from a minimum of 0.16 microseconds to a maximum of 0.24 microseconds. The SVM regression training DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range span- ning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. The SVM regression function evaluation DSP pipeline FPGA prototypes are compatible with data-sets spanning 2 to 8 dimensions, support vector sets of up to 16 support vectors, and have a pipeline latency range spanning from a minimum of 0.20 microseconds to a maximum of 0.30 microseconds. Finally, utilising LIBSVM training and the parallelised SVM DSP pipeline function eval- uation architecture prototypes, SVM classification and SVM regression was successfully applied to Rajkumar’s oil and gas pipeline fault detection and failure system legacy data- set yielding excellent results. Also utilising LIBSVM training, and, the parallelised SVM DSP pipeline function evaluation architecture prototypes, both SVM classification and SVM regression was applied to several chaotic systems as a feasibility study into the ap- plication of the SVM machine learning paradigm for chaotic and non-linear dynamical system parameter-identification. SVM classification was applied to the Lorenz Attrac- tor and an ANN-based chaotic oscillator to a reasonably acceptable degree of success. SVM classification was applied to the Mackey-Glass attractor yielding poor results. SVM regression was applied Lorenz Attractor and an ANN-based chaotic oscillator yielding av- erage but encouraging results. SVM regression was applied to the Mackey-Glass attractor yielding poor results
    corecore