Search CORE

50 research outputs found

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning

Author: Matsutani Hiroki
Tsukada Mineto
Watanabe Hirohisa
Publication venue
Publication date: 23/03/2021
Field of study

DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64

arXiv.org e-Print Archive

Communication Size Reduction of Federated Learning using Neural ODE Models

Author: Hoshino Yuto
Kawakami Hiroki
Matsutani Hiroki
Publication venue
Publication date: 10/03/2023
Field of study

Federated learning is a machine learning approach in which data is not aggregated on a server, but is trained at clients locally, in consideration of security and privacy. ResNet is a classic but representative neural network that succeeds in deepening the neural network by learning a residual function that adds the inputs and outputs together. In federated learning, communication is performed between the server and clients to exchange weight parameters. Since ResNet has deep layers and a large number of parameters, the communication size becomes large. In this paper, we use Neural ODE as a lightweight model of ResNet to reduce communication size in federated learning. In addition, we newly introduce a flexible federated learning using Neural ODE models with different number of iterations, which correspond to ResNet models with different depths. Evaluation results using CIFAR-10 dataset show that the use of Neural ODE reduces communication size by up to 92.4% compared to ResNet. We also show that the proposed flexible federated learning can merge models with different iteration counts or depths

arXiv.org e-Print Archive

An FPGA Acceleration and Optimization Techniques for 2D LiDAR SLAM Algorithm

Author: Matsutani Hiroki
Sugiura Keisuke
Publication venue
Publication date: 31/08/2020
Field of study

An efficient hardware implementation for Simultaneous Localization and Mapping (SLAM) methods is of necessity for mobile autonomous robots with limited computational resources. In this paper, we propose a resource-efficient FPGA implementation for accelerating scan matching computations, which typically cause a major bottleneck in 2D LiDAR SLAM methods. Scan matching is a process of correcting a robot pose by aligning the latest LiDAR measurements with an occupancy grid map, which encodes the information about the surrounding environment. We exploit an inherent parallelism in the Rao-Blackwellized Particle Filter (RBPF) based algorithms to perform scan matching computations for multiple particles in parallel. In the proposed design, several techniques are employed to reduce the resource utilization and to achieve the maximum throughput. Experimental results using the benchmark datasets show that the scan matching is accelerated by 5.31-8.75x and the overall throughput is improved by 3.72-5.10x without seriously degrading the quality of the final outputs. Furthermore, our proposed IP core requires only 44% of the total resources available in the TUL Pynq-Z2 FPGA board, thus facilitating the realization of SLAM applications on indoor mobile robots

arXiv.org e-Print Archive

An On-Device Federated Learning Approach for Cooperative Anomaly Detection

Author: Ito Rei
Matsutani Hiroki
Tsukada Mineto
Publication venue
Publication date: 01/01/2021
Field of study

Most edge AI focuses on prediction tasks on resource-limited edge devices while the training is done at server machines. However, retraining or customizing a model is required at edge devices as the model is becoming outdated due to environmental changes over time. To follow such a concept drift, a neural-network based on-device learning approach is recently proposed, so that edge devices train incoming data at runtime to update their model. In this case, since a training is done at distributed edge devices, the issue is that only a limited amount of training data can be used for each edge device. To address this issue, one approach is a cooperative learning or federated learning, where edge devices exchange their trained results and update their model by using those collected from the other devices. In this paper, as an on-device learning algorithm, we focus on OS-ELM (Online Sequential Extreme Learning Machine) to sequentially train a model based on recent samples and combine it with autoencoder for anomaly detection. We extend it for an on-device federated learning so that edge devices can exchange their trained results and update their model by using those collected from the other edge devices. This cooperative model update is one-shot while it can be repeatedly applied to synchronize their model. Our approach is evaluated with anomaly detection tasks generated from a driving dataset of cars, a human activity dataset, and MNIST dataset. The results demonstrate that the proposed on-device federated learning can produce a merged model by integrating trained results from multiple edge devices as accurately as traditional backpropagation based neural networks and a traditional federated learning approach with lower computation or communication cost

arXiv.org e-Print Archive

Directory of Open Access Journals

Recommended from our members

LaKe: The Power of In-Network Computing

Author: Matsutani Hiroki
Tokusashi Yuta
Zilberman N
Publication venue: 2018 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2018
Publication date: 01/01/2018
Field of study

In-network computing accelerates applications natively running on the host by executing them within network devices. While in-network computing offers significant performance improvements, its limitations and design trade-offs have not been explored. To usefully and efficiently run applications within the network, we first need to understand the implications of their design. In this work we introduce LaKe, a Layered Key-Value Store design, running as an in-network application. LaKe is a scalable design, enabling the exploration of design decisions and their effect on throughput, latency and power efficiency. LaKe achieves full line rate throughput, while maintaining a latency of 1.1μs and better power efficiency than existing hardware based memcached designs.This work was supported by JSPS Research Fellowship and Keio University Research Grant for Young Researcher’s Program. This work was supported by JST CREST Grant Number JPMJCR1785, Japan. We acknowledge the support of the Leverhulme Trust (ECF-2016-289) and the Isaac Newton Trust

Apollo (Cambridge)