On the Automatic Recognition of Facial Non-Verbal Communication Meaning in Informal, Spontaneous Conversation.

Abstract

Non-Verbal Communication (NVC) comprises all forms of inter-personal communication, apart from those that are based on words. NVC is essential to understand communicated meaning in common social situations, such as informal conversation. The expression and perception of NVC depends on many factors, including social and cultural context. The development of methods to automatically recognise NVC enables new, intuitive computer interfaces for novel applications, particularly when combined with emotion or natural speech recognition. This thesis addresses two questions: how can facial NVC signals be automatically recognised, given cultural differences in NVC perception? and, what do automatic recognition methods tell us about facial behaviour during informal conversations? A new data set was created based on recordings of people engaged in informal conversation. Minimal constraints were applied during the recording of the participants to ensure that the conversations were spontaneous. These conversations were annotated by volunteer observers, as well as paid workers via the Internet. This resulted in three sets of culturally specific annotations based on the geographical location of the annotator (Great Britain, India, Kenya). The cultures differed in the average label that the culture’s annotators assigned to each video clip. Annotations were based on four NVC signals: agreement, thinking, questioning and understanding, all of which commonly occur in conversations. An automatic NVC recognition system was trained based on culturally specific annotation data and was able to make predictions that reflected cultural differences in annotation. Various visual feature extraction methods and classifiers were compared to find an effective recognition approach. The problem was also considered from the perspective of regression of dimensional, continuous valued annotation labels, using Support Vector Regression (SVR), which enables the prediction of labels which have richer information content than discrete classes. The use of Sequential Backward Elimination (SBE) feature selection was shown to greatly increase recognition performance. With a method for extracting the relevant facial features, it becomes possible to investigate human behaviour in informal conversation using computer tools. Firstly, the areas of the face used by the automatic recognition system can be identified and visualised. The involvement of gaze in thinking is confirmed, and a new association between gestures and NVC are identified, i.e. brow lowering (AU4) during questioning. These findings provide clues as to the way humans perceive NVC. Secondly, the existence of coupling in human expression is quantified and visualised. Patterns exist in both mutual head pose and in the mouth area, some of which may relate to mutual smiling. This coupling effect is used in an automatic NVC recognition system based on backchannel signals

    Similar works