56 research outputs found

    Automatic propagation of manual annotations for multimodal person identification in TV shows

    No full text
    International audienceIn this paper an approach to human annotation propagation for person identification in the multimodal context is proposed. A system is used, which combines speaker diarization and face clustering to produce multimodal clusters. The whole multimodal clusters are later annotated rather than just single tracks, which is done by propagation. Optical character recogni- tion systems provides initial annotation. Four different strategies, which select candidates for annotation, are tested. The initial results of annotation propagation are promising. With the use of a proper active learning selection strategy the human annotator involvement could be reduced even further

    The DRIVE-SAFE project: signal processing and advanced information technologies for improving driving prudence and accidents

    Get PDF
    In this paper, we will talk about the Drivesafe project whose aim is creating conditions for prudent driving on highways and roadways with the purposes of reducing accidents caused by driver behavior. To achieve these primary goals, critical data is being collected from multimodal sensors (such as cameras, microphones, and other sensors) to build a unique databank on driver behavior. We are developing system and technologies for analyzing the data and automatically determining potentially dangerous situations (such as driver fatigue, distraction, etc.). Based on the findings from these studies, we will propose systems for warning the drivers and taking other precautionary measures to avoid accidents once a dangerous situation is detected. In order to address these issues a national consortium has been formed including Automotive Research Center (OTAM), Koç University, Istanbul Technical University, Sabancı University, Ford A.S., Renault A.S., and Fiat A. Ş

    Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

    Get PDF

    Meetings and Meeting Modeling in Smart Environments

    Get PDF
    In this paper we survey our research on smart meeting rooms and its relevance for augmented reality meeting support and virtual reality generation of meetings in real time or off-line. The research reported here forms part of the European 5th and 6th framework programme projects multi-modal meeting manager (M4) and augmented multi-party interaction (AMI). Both projects aim at building a smart meeting environment that is able to collect multimodal captures of the activities and discussions in a meeting room, with the aim to use this information as input to tools that allow real-time support, browsing, retrieval and summarization of meetings. Our aim is to research (semantic) representations of what takes place during meetings in order to allow generation, e.g. in virtual reality, of meeting activities (discussions, presentations, voting, etc.). Being able to do so also allows us to look at tools that provide support during a meeting and at tools that allow those not able to be physically present during a meeting to take part in a virtual way. This may lead to situations where the differences between real meeting participants, human-controlled virtual participants and (semi-) autonomous virtual participants disappear

    Learnable PINs: Cross-Modal Embeddings for Person Identity

    Full text link
    We propose and investigate an identity sensitive joint embedding of face and voice. Such an embedding enables cross-modal retrieval from voice to face and from face to voice. We make the following four contributions: first, we show that the embedding can be learnt from videos of talking faces, without requiring any identity labels, using a form of cross-modal self-supervision; second, we develop a curriculum learning schedule for hard negative mining targeted to this task, that is essential for learning to proceed successfully; third, we demonstrate and evaluate cross-modal retrieval for identities unseen and unheard during training over a number of scenarios and establish a benchmark for this novel task; finally, we show an application of using the joint embedding for automatically retrieving and labelling characters in TV dramas.Comment: To appear in ECCV 201

    Collaborative Annotation for Person Identification in TV Shows

    No full text
    International audienceThis paper presents a collaborative annotation framework for person identification in TV shows. The web annotation front-end will be demonstrated during the Show and Tell session. All the code for annotation is made available on github. The tool can also be used in a crowd-sourcing environment

    Research and Practice on Fusion of Visual and Audio Perception

    Get PDF
    随着监控系统智能化的快速发展,监控数据在交通、环境、安防等领域发挥着越来越重要的作用。受人类感知模型的启发,利用音频数据与视频数据的互补效应对场景进行感知具有较好地研究价值。然而随之产生的海量监控数据越来越难以检索,这迫使人们寻找更加有效地分析方法,从而将人从重复的劳动中解脱出来。因此,音视频融合感知技术不仅具有重要的理论研究价值,在应用前景上也是大有可为。 本文研究了当前音视频融合感知领域发展的现状,以传统视频监控平台为基础,设计了音视频融合感知的体系结构。立足于音视频内容分析,研究了基于音视频融合感知的暴力场景分析模型。本文主要贡献如下: 1. 以音视频融合感知的监控平台为出发点,设计...With the rapid development of intelligent monitoring system, monitoring data is playing an increasingly important role in traffic, environment, security and the other fields. Inspired by the model of human perception, people use the complementary effect of audio and visual data to percept the scene. And then the huge amount of visual-audio data forces people to look for a more effective way to ana...学位:工学硕士院系专业:信息科学与技术学院_计算机科学与技术学号:2302012115292
    corecore