37,749 research outputs found

    PersonRank: Detecting Important People in Images

    Full text link
    Always, some individuals in images are more important/attractive than others in some events such as presentation, basketball game or speech. However, it is challenging to find important people among all individuals in images directly based on their spatial or appearance information due to the existence of diverse variations of pose, action, appearance of persons and various changes of occasions. We overcome this difficulty by constructing a multiple Hyper-Interaction Graph to treat each individual in an image as a node and inferring the most active node referring to interactions estimated by various types of clews. We model pairwise interactions between persons as the edge message communicated between nodes, resulting in a bidirectional pairwise-interaction graph. To enrich the personperson interaction estimation, we further introduce a unidirectional hyper-interaction graph that models the consensus of interaction between a focal person and any person in a local region around. Finally, we modify the PageRank algorithm to infer the activeness of persons on the multiple Hybrid-Interaction Graph (HIG), the union of the pairwise-interaction and hyperinteraction graphs, and we call our algorithm the PersonRank. In order to provide publicable datasets for evaluation, we have contributed a new dataset called Multi-scene Important People Image Dataset and gathered a NCAA Basketball Image Dataset from sports game sequences. We have demonstrated that the proposed PersonRank outperforms related methods clearly and substantially.Comment: 8 pages, conferenc

    Learnable PINs: Cross-Modal Embeddings for Person Identity

    Full text link
    We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201
    • …
    corecore