This paper investigates the use of unlabeled data to help labeled data for audio-visual event recognition in meetings. To deal with situations in which it is difficult to collect enough labeled data to capture event characteristics, but collecting a large amount of unlabeled data is easy, we present a semisupervised framework using HMM adaptation techniques. Instead of directly training one model for each event, we first train a well-estimated general event model for all events using both labeled and unlabeled data, and then adapt the general model to each specific event model using its own labeled data. We illustrate the proposed approach with a set of eight audio-visual events defined in meetings. Experiments and comparison with the fully-supervised baseline method show the validity of the proposed semi-supervised approach. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.