Human-robot interaction (HRI) is a rapidly growing field that encompasses
social and industrial applications. Machine learning plays a vital role in
industrial HRI by enhancing the adaptability and autonomy of robots in complex
environments. However, data privacy is a crucial concern in the interaction
between humans and robots, as companies need to protect sensitive data while
machine learning algorithms require access to large datasets. Federated
Learning (FL) offers a solution by enabling the distributed training of models
without sharing raw data. Despite extensive research on Federated learning (FL)
for tasks such as natural language processing (NLP) and image classification,
the question of how to use FL for HRI remains an open research problem. The
traditional FL approach involves transmitting large neural network parameter
matrices between the server and clients, which can lead to high communication
costs and often becomes a bottleneck in FL. This paper proposes a
communication-efficient FL framework for human-robot interaction (CEFHRI) to
address the challenges of data heterogeneity and communication costs. The
framework leverages pre-trained models and introduces a trainable
spatiotemporal adapter for video understanding tasks in HRI. Experimental
results on three human-robot interaction benchmark datasets: HRI30, InHARD, and
COIN demonstrate the superiority of CEFHRI over full fine-tuning in terms of
communication costs. The proposed methodology provides a secure and efficient
approach to HRI federated learning, particularly in industrial environments
with data privacy concerns and limited communication bandwidth. Our code is
available at
https://github.com/umarkhalidAI/CEFHRI-Efficient-Federated-Learning.Comment: Accepted in IROS 202