Hierarchical Zero-Shot Approach for Human Activity Recognition in Smart Homes

Abstract

Accurate Human Activity Recognition (HAR) in smart homes is crucial for applications such as health monitoring and elderly care. Traditional HAR models typically require extensive labelled training data specific to each activity. This poses a challenge when models are transferred to new environments, where residents may perform a different range of activities. Conventional HAR methods struggle with generalization when faced with multiple, diverse, and unseen activity classes. To address this gap, we leverage pre-trained language models and embed contextual knowledge, such as location information, to narrow down the number of potential activity labels. Our approach involves engineering rich textual representation derived from given sensor data. We then apply language models in a hierarchical two-step manner, where we first identify the person’s location in the home and then classify activities based on a reduced set of location-specific labels. We evaluate our approach using some of the major proprietary and open-source Large Language Models (LLMs), including 1) Chat-GPT, 2) Mistral Instruct, and 3) SBERT. Experiments were performed on the “van Kasteren” and “van Kasteren houses” datasets. While Chat-GPT achieves a macro-average F-Score of 26.63% on the “van Kasteren” dataset, our two-step SBERT approach significantly outperforms it with a score of 37.83%. On the more challenging “van Kasteren houses” dataset, our method using Mistral Instruct achieves a macro-average F-Score of 19.33%, compared to 15.46% for Chat-GPT. These results demonstrate the effectiveness of our method in enhancing HAR accuracy, while also ensuring computational efficiency by utilizing models that are significantly smaller than state-of-the-art LLMs like Chat-GPT

    Similar works