13 research outputs found

    Automatic generation of hardware Tree Classifiers

    Full text link
    Machine Learning is growing in popularity and spreading across different fields for various applications. Due to this trend, machine learning algorithms use different hardware platforms and are being experimented to obtain high test accuracy and throughput. FPGAs are well-suited hardware platform for machine learning because of its re-programmability and lower power consumption. Programming using FPGAs for machine learning algorithms requires substantial engineering time and effort compared to software implementation. We propose a software assisted design flow to program FPGA for machine learning algorithms using our hardware library. The hardware library is highly parameterized and it accommodates Tree Classifiers. As of now, our library consists of the components required to implement decision trees and random forests. The whole automation is wrapped around using a python script which takes you from the first step of having a dataset and design choices to the last step of having a hardware descriptive code for the trained machine learning model

    Decision Tree-Based Multiple Classifier Systems: An FPGA Perspective

    Get PDF
    Combining a hardware approach with a multiple classifier method can deeply improve system performance, since the multiple classifier system can successfully enhance the classification accuracy with respect to a single classifier, and a hardware implementation would lead to systems able to classify samples with high throughput and with a short latency. To the best of our knowledge, no paper in the literature takes into account the multiple classifier scheme as additional design parameter, mainly because of lack of efficient hardware combiner architecture. In order to fill this gap, in this paper we will first propose a novel approach for an efficient hardware implementation of the majority voting combining rule. Then, we will illustrate a design methodology to suitably embed in a digital device a multiple classifier system having Decision Trees as base classifiers and a majority voting rule as combiner. Bagging, Boosting and Random Forests will be taken into account. We will prove the effectiveness of the proposed approach on two real case studies related to Big Data issues

    An Efficient FPGA-based Accelerator for Deep Forest

    Full text link
    Deep Forest is a prominent machine learning algorithm known for its high accuracy in forecasting. Compared with deep neural networks, Deep Forest has almost no multiplication operations and has better performance on small datasets. However, due to the deep structure and large forest quantity, it suffers from large amounts of calculation and memory consumption. In this paper, an efficient hardware accelerator is proposed for deep forest models, which is also the first work to implement Deep Forest on FPGA. Firstly, a delicate node computing unit (NCU) is designed to improve inference speed. Secondly, based on NCU, an efficient architecture and an adaptive dataflow are proposed, in order to alleviate the problem of node computing imbalance in the classification process. Moreover, an optimized storage scheme in this design also improves hardware utilization and power efficiency. The proposed design is implemented on an FPGA board, Intel Stratix V, and it is evaluated by two typical datasets, ADULT and Face Mask Detection. The experimental results show that the proposed design can achieve around 40x speedup compared to that on a 40 cores high performance x86 CPU.Comment: 5 pages, 5 figures, conferenc

    High performance modified bit-vector based packet classification module on low-cost FPGA

    Get PDF
    The packet classification plays a significant role in many network systems, which requires the incoming packets to be categorized into different flows and must take specific actions as per functional and application requirements. The network system speed is continuously increasing, so the demand for the packet classifier also increased. Also, the packet classifier's complexity is increased further due to multiple fields should match against a large number of rules. In this manuscript, an efficient and high performance modified bitvector (MBV) based packet classification (PC) is designed and implemented on low-cost Artix-7 FPGA. The proposed MBV based PC employs pipelined architecture, which offers low latency and high throughput for PC. The MBV based PC utilizes <2% slices, operating at 493.102 MHz, and consumes 0.1 W total power on Artix-7 FPGA. The proposed PC considers only 4 clock cycles to classify the incoming packets and provides 74.95 Gbps throughput. The comparative results in terms of hardware utilization and performance efficiency of proposed work with existing similar PC approaches are analyzed with better constraints improvement

    Towards Efficient and Scalable Acceleration of Online Decision Tree Learning on FPGA

    Full text link
    Decision trees are machine learning models commonly used in various application scenarios. In the era of big data, traditional decision tree induction algorithms are not suitable for learning large-scale datasets due to their stringent data storage requirement. Online decision tree learning algorithms have been devised to tackle this problem by concurrently training with incoming samples and providing inference results. However, even the most up-to-date online tree learning algorithms still suffer from either high memory usage or high computational intensity with dependency and long latency, making them challenging to implement in hardware. To overcome these difficulties, we introduce a new quantile-based algorithm to improve the induction of the Hoeffding tree, one of the state-of-the-art online learning models. The proposed algorithm is light-weight in terms of both memory and computational demand, while still maintaining high generalization ability. A series of optimization techniques dedicated to the proposed algorithm have been investigated from the hardware perspective, including coarse-grained and fine-grained parallelism, dynamic and memory-based resource sharing, pipelining with data forwarding. We further present a high-performance, hardware-efficient and scalable online decision tree learning system on a field-programmable gate array (FPGA) with system-level optimization techniques. Experimental results show that our proposed algorithm outperforms the state-of-the-art Hoeffding tree learning method, leading to 0.05% to 12.3% improvement in inference accuracy. Real implementation of the complete learning system on the FPGA demonstrates a 384x to 1581x speedup in execution time over the state-of-the-art design.Comment: appear as a conference paper in FCCM 201

    FPGA accelerator for gradient boosting decision trees

    Get PDF
    A decision tree is a well-known machine learning technique. Recently their popularity has increased due to the powerful Gradient Boosting ensemble method that allows to gradually increasing accuracy at the cost of executing a large number of decision trees. In this paper we present an accelerator designed to optimize the execution of these trees while reducing the energy consumption. We have implemented it in an FPGA for embedded systems, and we have tested it with a relevant case-study: pixel classification of hyperspectral images. In our experiments with different images our accelerator can process the hyperspectral images at the same speed at which they are generated by the hyperspectral sensors. Compared to a high-performance processor running optimized software, on average our design is twice as fast and consumes 72 times less energy. Compared to an embedded processor, it is 30 times faster and consumes 23 times less energy

    Software and FPGA-Based Hardware to Accelerate Machine Learning Classifiers

    Get PDF
    This thesis improves the accuracy and run-time of two selected machine learning algorithms, the first in software and the second on a field-programmable gate array (FPGA) device. We first implement triplet loss and triplet mining methods on large margin metric learning, inspired by Siamese networks, and we analyze the proposed methods. In addition, we propose a new hierarchical approach to accelerate the optimization, where triplets are selected by stratified sampling in hierarchical hyperspheres. The method results in faster optimization time and in almost all cases, and shows improved accuracy. This method is further studied for high-dimensional feature spaces with the goal of finding a projection subspace to increase and decrease the inter- and intra class variances, respectively. We also studied hardware acceleration of random forests (RFs) to improve the classification run-time for large datasets. RFs are a widely used classification and regression algorithm, typically implemented in software. Hardware implementations can be used to accelerate RF especially on FPGA platforms due to concurrent memory access and parallel computational abilities. This thesis proposes a method to decrease the training time by expanding on memory usage on an Intel Arria 10 (10AX115N 3F 45I2SG) FPGA, while keeping high accuracy comparable with CPU implementations
    corecore