Agents must monitor their partners' affective states continuously in order to
understand and engage in social interactions. However, methods for evaluating
affect recognition do not account for changes in classification performance
that may occur during occlusions or transitions between affective states. This
paper addresses temporal patterns in affect classification performance in the
context of an infant-robot interaction, where infants' affective states
contribute to their ability to participate in a therapeutic leg movement
activity. To support robustness to facial occlusions in video recordings, we
trained infant affect recognition classifiers using both facial and body
features. Next, we conducted an in-depth analysis of our best-performing models
to evaluate how performance changed over time as the models encountered missing
data and changing infant affect. During time windows when features were
extracted with high confidence, a unimodal model trained on facial features
achieved the same optimal performance as multimodal models trained on both
facial and body features. However, multimodal models outperformed unimodal
models when evaluated on the entire dataset. Additionally, model performance
was weakest when predicting an affective state transition and improved after
multiple predictions of the same affective state. These findings emphasize the
benefits of incorporating body features in continuous affect recognition for
infants. Our work highlights the importance of evaluating variability in model
performance both over time and in the presence of missing data when applying
affect recognition to social interactions.Comment: 8 pages, 6 figures, 10th International Conference on Affective
Computing and Intelligent Interaction (ACII 2022