6 research outputs found

    Nommage non supervisé des personnes dans les émissions de télévision. Utilisation des noms écrits, des noms prononcés ou des deux ?

    Get PDF
    National audienceL'identification de personnes dans les émissions de télévision est un outil précieux pour l'indexation de ce type de vidéos mais l'utilisation de modèles biométriques n'est pas une option viable sans connaissance a priori des personnes présentes dans les vidéos. Les noms prononcés ou écrits peuvent nous fournir une liste de noms hypothèses. Nous proposons une comparaison du potentiel de ces deux modalités (noms prononcés ou écrits) afin d'extraire le nom des personnes parlant et/ou apparaissant. Les noms prononcés proposent un plus grand nombre d'occurrences de citation mais les erreurs de transcription et de détection de ces noms réduisent de moitié le potentiel de cette modalité. Les noms écrits bénéficient d'une amélioration croissante de la qualité des vidéos et sont plus facilement détectés. Par ailleurs, l'affiliation aux locuteurs/visages des noms écrits reste plus simple que pour les noms prononcés

    Anotación automática de personas en programas de TV sin supervisión

    Get PDF
    La gran cantidad de datos visuales que se generan en la actualidad lleva a la necesidad de crear herramientas de anotación que permitan la búsqueda y la recuperación de la información que queramos en los vídeos. Una de las informaciones más importante de un vídeo es la identidad de personas. En este contexto, la anotación consiste en determinar quién aparece y cuando lo hace

    Person annotation in video sequences

    Get PDF
    In the recent years, the demand for video tools to automatically annotate and classify large audiovisual datasets has increased considerably. One specific task in this field applies to TV broadcast videos, to determine who and when a person appears in a video sequence. This work starts from the base of the ALBAYZIN evaluation series presented in the IberSPEECH-RTVE 2018 in Barcelona, and the purpose of this thesis is trying to improve the results obtained and compare the different face detection and tracking methods. We will evaluate the performance of classic face detection techniques and other techniques based on machine learning on a closed dataset of 34 known people. The rest of characters on the audiovisual document will be labelled as "unknown". We will work with small videos and images of each known character to build his/her model and finally, evaluate the performance of the ALBAYZIN algorithm over a 2h video called "La noche en 24H" whose format is like a news program. We will analyze the results and the type of errors and scenarios we encountered as well as the solutions we propose for each of them if there is any. In this work, We will only focus on a monomodal basis of face recognition and tracking

    Fusion of Speech, Faces and Text for Person Identification in TV Broadcast

    Get PDF
    Poster session: WS21 - Workshop on Information Fusion in Computer Vision for Concept RecognitionInternational audienceThe REPERE challenge is a project aiming at the evaluation of systems for supervised and unsupervised multimodal recognition of people in TV broadcast. In this paper, we describe, evaluate and discuss QCompere consortium submissions to the 2012 REPERE evaluation campaign dry-run. Speaker identification (and face recognition) can be greatly improved when combined with name detection through video optical character recognition. Moreover, we show that unsupervised multimodal person recognition systems can achieve performance nearly as good as supervised monomodal ones (with several hundreds of identity models)

    Discriminative Appearance Models for Face Alignment

    Get PDF
    The proposed face alignment algorithm uses local gradient features as the appearance representation. These features are obtained by pixel value comparison, which provide robustness against changes in illumination, as well as partial occlusion and local deformation due to the locality. The adopted features are modeled in three discriminative methods, which correspond to different alignment cost functions. The discriminative appearance modeling alleviate the generalization problem to some extent

    Contextual Person Identification in Multimedia Data

    Get PDF
    We propose methods to improve automatic person identification, regardless of the visibility of a face, by integration of multiple cues including multiple modalities and contextual information. We propose a joint learning approach using contextual information from videos to improve learned face models. Further, we integrate additional modalities in a global fusion framework. We evaluate our approaches on a novel TV series data set, consisting of over 100 000 annotated faces
    corecore