814 research outputs found

    Towards Lightweight AI: Leveraging Stochasticity, Quantization, and Tensorization for Forecasting

    Get PDF
    The deep neural network is an intriguing prognostic model capable of learning meaningful patterns that generalize to new data. The deep learning paradigm has been widely adopted across many domains, including for natural language processing, genomics, and automatic music transcription. However, deep neural networks rely on a plethora of underlying computational units and data, collectively demanding a wealth of compute and memory resources for practical tasks. This model complexity prohibits the use of larger deep neural networks for resource-critical applications, such as edge computing. In order to reduce model complexity, several research groups are actively studying compression methods, hardware accelerators, and alternative computing paradigms. These orthogonal research explorations often leave a gap in understanding the interplay of the optimization mechanisms and their overall feasibility for a given task. In this thesis, we address this gap by developing a holistic solution to assess the model complexity reduction theoretically and quantitatively at both high-level and low-level abstractions for training and inference. At the algorithmic level, a novel deep, yet lightweight, recurrent architecture is proposed that extends the conventional echo state network. The architecture employs random dynamics, brain-inspired plasticity mechanisms, tensor decomposition, and hierarchy as the key features to enrich learning. Furthermore, the hyperparameter landscape is optimized via a particle swarm optimization algorithm. To deploy these networks efficiently onto low-end edge devices, both ultra-low and mixed-precision numerical formats are studied within our feedforward deep neural network hardware accelerator. More importantly, the tapered-precision posit format with a novel exact-dot-product algorithm is employed in the low-level digital architectures to study its efficacy in resource utilization. The dynamics of the architecture are characterized through neuronal partitioning and Lyapunov stability, and we show that superlative networks emerge beyond the edge of chaos with an agglomeration of weak learners. We also demonstrate that tensorization improves model performance by preserving correlations present in multi-way structures. Low-precision posits are found to consistently outperform other formats on various image classification tasks and, in conjunction with compression, we achieve magnitudes of speedup and memory savings for both training and inference for the forecasting of chaotic time series and polyphonic music tasks. This culmination of methods greatly improves the feasibility of deploying rich predictive models on edge devices

    Design and Development of an FPGA-based Hardware Accelerator for Corner Feature Extraction and Genetic Algorithm-based SLAM System

    Get PDF
    Simultaneous Localization and Mapping (SLAM) systems are crucial parts of mobile robots. These systems require a large number of computing units, have significant real-time requirements and are also a vital factor which can determine the stability, operability and power consumption of robots. This thesis aims to improve the calculation speed of a lidar-based SLAM system in domestic scenes, reduce the power consumption of the SLAM algorithm, and reduce the overall cost of the whole platform. Lightweight, low-power and parallel optimization of SLAM algorithms are researched. In the thesis, two SLAM systems are designed and developed with a focus on energy-efficient and fast hardware-level design: a geometric method based on corner extraction and a genetic algorithm-based approach. Finally, an FPGA-based hardware accelerated SLAM is implemented and realized, and compared to a software-based system. As for the front-end SLAM system, a method of using a Corner Feature Extraction (CFE) algorithm on FPGA platforms is first proposed to improve the speed of the feature extraction. Considering building a back-end SLAM system with low power consumption, a SLAM system based on genetic algorithm combined with algorithms such as Extended Kalman Filter (EKF) and FastSLAM to reduce the amount of calculation in the SLAM system is also proposed. Finally, the thesis also proposes and implements an adaptive feature map which can replace a grid point map to reduce the amount of calculation and utilization of hardware resources. In this thesis, the lidar SLAM system with front-end and back-end parts mentioned above is implemented on the Xilinx PYNQ Z2 Platform. The implementation is operated on a mobile robot prototype and evaluated in real scenes. Compared with the implementation on the Raspberry Pi 3B+, the implementation in this thesis can save 86.25% of power consumption. The lidar SLAM system only takes 20 ms for location calculation in each scan which is 5.31 times faster compared with the software implementation with EKF

    Clustering Arabic Tweets for Sentiment Analysis

    Get PDF
    The focus of this study is to evaluate the impact of linguistic preprocessing and similarity functions for clustering Arabic Twitter tweets. The experiments apply an optimized version of the standard K-Means algorithm to assign tweets into positive and negative categories. The results show that root-based stemming has a significant advantage over light stemming in all settings. The Averaged Kullback-Leibler Divergence similarity function clearly outperforms the Cosine, Pearson Correlation, Jaccard Coefficient and Euclidean functions. The combination of the Averaged Kullback-Leibler Divergence and root-based stemming achieved the highest purity of 0.764 while the second-best purity was 0.719. These results are of importance as it is contrary to normal-sized documents where, in many information retrieval applications, light stemming performs better than root-based stemming and the Cosine function is commonly used
    corecore