Search CORE

2 research outputs found

Acceleration of ListNet for ranking using reconfigurable architecture

Author: Li Qiang
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/04/2020
Field of study

Document ranking is used to order query results by relevance with ranking models. ListNet is a well-known ranking approach for constructing and training learning-to-rank models. Compared with traditional learning approaches, ListNet delivers better accuracy, but is computationally too expensive to learn models with large data sets due to the large number of permutations and documents involved in computing the gradients. Currently, the long training time limits the practicality of ListNet in ranking applications such as breaking news search and stock prediction, and this situation is getting worse with the increase in data-set size. In order to tackle the challenge of long training time, this thesis optimises the ListNet algorithm, and designs hardware accelerators for learning the ListNet algorithm using Field Programmable Gate Arrays (FPGAs), making the algorithm more practical for real-world application. The contributions of this thesis include: 1) A novel computation method of the ListNet algorithm for ranking. The proposed computation method exposes more fine-grained parallelism for FPGA implementation. 2) A weighted sampling method that takes into account the ranking positions, along with an effective quantisation method based on FPGA devices. The proposed design achieves a 4.42x improvement over GPU implementation speed, while still guaranteeing the accuracy. 3) A full reconfigurable architecture for the ListNet training using multiple bitstream kernels. The proposed method achieves a higher model accuracy than pure fixed point training, and a better throughput than pure floating point training. This thesis has resulted in the acceleration of the ListNet algorithm for ranking using FPGAs by applying the above techniques. Significant improvements in speed have been achieved in this work against CPU and GPU implementations.Open Acces

Spiral - Imperial College Digital Repository

Efficient Design, Training, and Deployment of Artificial Neural Networks

Author: Tran Dat Thanh
Publication venue: Tampere University
Publication date: 04/03/2022
Field of study

Over the last decade, artificial neural networks, especially deep neural networks, have emerged as the main modeling tool in Machine Learning, allowing us to tackle an increasing number of real-world problems in various fields, most notably, in computer vision, natural language processing, biomedical and financial analysis. The success of deep neural networks can be attributed to many factors, namely the increasing amount of data available, the developments of dedicated hardware, the advancements in optimization techniques, and especially the invention of novel neural network architectures. Nowadays, state-of-the-arts neural networks that achieve the best performance in any field are usually formed by several layers, comprising millions, or even billions of parameters. Despite spectacular performances, optimizing a single state-of- the-arts neural network often requires a tremendous amount of computation, which can take several days using high-end hardware. More importantly, it took several years of experimentation for the community to gradually discover effective neural network architectures, moving from AlexNet, VGGNet, to ResNet, and then DenseNet. In addition to the expensive and time-consuming experimentation process, deep neural networks, which require powerful processors to operate during the deployment phase, cannot be easily deployed to mobile or embedded devices. For these reasons, improving the design, training, and deployment of deep neural networks has become an important area of research in the Machine Learning field. This thesis makes several contributions in the aforementioned research area, which can be grouped into two main categories. The first category consists of research works that focus on designing efficient neural network architectures not only in terms of accuracy but also computational complexity. In the first contribution under this category, the computational efficiency is first addressed at the filter level through the incorporation of a handcrafted design for convolutional neural networks, which are the basis of most deep neural networks. More specifically, the multilinear convolution filter is proposed to replace the linear convolution filter, which is a fundamental element in a convolutional neural network. The new filter design not only better captures multidimensional structures inherent in CNNs but also requires far fewer parameters to be estimated. While using efficient algebraic transforms and approximation techniques to tackle the design problem can significantly reduce the memory and computational footprint of neural network models, this approach requires a lot of trial and error. In addition, the simple neuron model used in most neural networks nowadays, which only performs a linear transformation followed by a nonlinear activation, cannot effectively mimic the diverse activities of biological neurons. For this reason, the second and third contributions transition from a handcrafted, manual design approach to an algorithmic approach in which the type of transformations performed by each neuron as well as the topology of neural networks are optimized in a systematic and completely data-dependent manner. As a result, the algorithms proposed in the second and third contributions are capable of designing highly accurate and compact neural networks while requiring minimal human efforts or intervention in the design process. Despite significant progress has been made to reduce the runtime complexity of neural network models on embedded devices, the majority of them have been demonstrated on powerful embedded devices, which are costly in applications that require large-scale deployment such as surveillance systems. In these scenarios, complete on-device processing solutions can be infeasible. On the contrary, hybrid solutions, where some preprocessing steps are conducted on the client side while the heavy computation takes place on the server side, are more practical. The second category of contributions made in this thesis focuses on efficient learning methodologies for hybrid solutions that take into ac- count both the signal acquisition and inference steps. More concretely, the first contribution under this category is the formulation of the Multilinear Compressive Learning framework in which multidimensional signals are compressively acquired, and inference is made based on the compressed signals, bypassing the signal reconstruction step. In the second contribution, the relationships be- tween the input signal resolution, the compression rate, and the learning performance of Multilinear Compressive Learning systems are empirically analyzed systematically, leading to the discovery of a surrogate performance indicator that can be used to approximately rank the learning performances of different sensor configurations without conducting the entire optimization process. Nowadays, many communication protocols provide support for adaptive data transmission to maximize the data throughput and minimize energy consumption depending on the network’s strength. The last contribution of this thesis proposes an extension of the Multilinear Compressive Learning framework with an adaptive compression capability, which enables us to take advantage of the adaptive rate transmission feature in existing communication protocols to maximize the informational content throughput of the whole system. Finally, all methodological contributions of this thesis are accompanied by extensive empirical analyses demonstrating their performance and computational advantages over existing methods in different computer vision applications such as object recognition, face verification, human activity classification, and visual information retrieval

Trepo - Institutional Repository of Tampere University