Automatic distinction between posed and spontaneous expressions is an unsolved problem. Previously cognitive sciences’ studies indicated that the automatic separation of posed from spontaneous expressions is possible using the face modality. However, little is known about the information from head and shoulder motion. In this work, we propose to (i) distinguish between posed and spontaneous smiles by fusing head, face, and shoulder modalities, (ii) investigate which modalities carry important information and how the modalities relate to each other, and (iii) to which extent the temporal dynamics of these signals attribute to solving the problem. A cylindrical head tracker is used to track head motion and two particle filtering techniques to track facial and shoulder motion. Classification is performed by kernel methods combined with ensemble learning techniques. We investigated two aspects of multimodal fusion: the level of abstraction (i.e., early, mid-level, and late fusion) and the fusion rule used (i.e., sum, product and weight criteria). Experimental results from 100 videos displaying posed smiles and 102 videos displaying spontaneous smiles are presented. Best results were obtained with late fusion of all modalities when 94.0 % of the videos were classified correctly. Categories and Subject Descriptors I.2.10 [Vision and scene understanding]: [Video analysis]
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.