Abstract—In this paper, we present a face annotation system to automatically collect and label celebrity faces from the web. With the proposed system, we have constructed a large-scale dataset called “Celebrities on the Web, ” which contains 2.45 million distinct images of 421 436 celebrities and is orders of magnitude larger than previous datasets. Collecting and labeling such a large-scale dataset pose great challenges on current multimedia mining methods. In this work, a two-step face annotation approach is proposed to accomplish this task. In the first step, an image annotation system is proposed to label an input image with a list of celebrities. To utilize the noisy textual data, we construct a large-scale celebrity name vocabulary to identify candidate names from the surrounding text. Moreover, we expand the scope of analysis to the surrounding text of webpages hosting near-duplicates of the input image. In the second step, the celebrity names are assigned to the faces by label propagation on a facial similarity graph. To cope with the large variance in the facial appearances, a context likelihood is proposed to constrain the name assignment process. In an evaluation on 21 735 faces, both the image annotation system and name assignment algorithm significantly outperform previous techniques. Index Terms—Face recognition, image annotation, image database, image retrieval. I
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.