During face-to-face interaction, facial motion conveys
information at various levels. These include a person's emotional
condition, position in a discourse, and, while speaking, phonetic
details about the speech sounds being produced. Trivially, the
measurement of face motion is a prerequisite for any further analysis
of its functional characteristics or information content. It is
possible to make precise measures of locations on the face using
systems that track the motion by means of active or passive markers
placed directly on the face. Such systems, however, have the
disadvantages of requiring specialised equipment, thus restricting the
use outside the lab, and being invasive in the sense that the markers
have to be attached to the subject's face.
To overcome these limitations we developed a video-based system to
measure face motion from standard video recordings by deforming the
surface of an ellipsoidal mesh fit to the face. The mesh is
initialised manually for a reference frame and then projected onto
subsequent video frames. Location changes (between successive frames)
for each mesh node are determined adaptively within a well-defined
area around each mesh node, using a two-dimensional cross-correlation
analysis on a two-dimensional wavelet transform of the
frames. Position parameters are propagated in three steps from a
coarser mesh and a correspondingly higher scale of the wavelet
transform to the final fine mesh and lower scale of the wavelet
transform. The sequential changes in position of the mesh nodes
represent the facial motion. The method takes advantage of inherent
constraints of the facial surfaces which distinguishes it from more
general image motion estimation methods and it returns measurement
points globally distributed over the facial surface contrary to
feature-based methods