2 research outputs found

    Ensemble Model Compression for~Fast and~Energy-Efficient Ranking on~{FPGAs}

    Get PDF
    We investigate novel SoC-FPGA solutions for fast and energy-efficient ranking based on machine-learned ensembles of decision trees. Since the memory footprint of ranking ensembles limits the effective exploitation of programmable logic for large-scale inference tasks, we investigate binning and quantization techniques to reduce the memory occupation of the learned model and we optimize the state-of-the-art ensemble-traversal algorithm for deployment on low-cost, energy-efficient FPGA devices. The results of the experiments conducted using publicly available Learning-to-Rank datasets, show that our model compression techniques do not impact significantly the accuracy. Moreover, the reduced space requirements allow the models and the logic to be replicated on the FPGA device in order to execute several inference tasks in parallel. We discuss in details the experimental settings and the feasibility of the deployment of the proposed solution in a real setting. The results of the experiments conducted show that our FPGA solution achieves performances at the state of the art and consumes from 9x up to 19.8x less energy than an equivalent multi-threaded CPU implementation

    Accelerating position-aware top-k listnet for ranking under custom precision regimes

    No full text
    Document ranking is used to order query results by relevance with ranking models. ListNet is a well-know ranking approach for constructing and training learning to rank models. Compared with traditional learning approaches, ListNet delivers better accuracy, but is computationally too expensive to learn models with large datasets due to the large number of permutations involved in computing the gradients. This paper introduces a position-aware sampling approach, which takes the importance of ranking positions into account and shows better accuracy than previous sampling methods. We also propose an effective quantisation method based on FPGA devices for the ListNet algorithm, which organises the gradient values to several batches, and associates each batch with a different fractional precision. We implemented our approach on a Xilinx Ultrascale+ board and applied it to the MQ 2008 benchmark dataset for ranking. The experiment results show a 4.42x speedup over an Nvidia GTX 1080T GPU implementation with 2% accuracy loss.</p
    corecore