14 research outputs found

    Automatic propagation of manual annotations for multimodal person identification in TV shows

    No full text
    International audienceIn this paper an approach to human annotation propagation for person identification in the multimodal context is proposed. A system is used, which combines speaker diarization and face clustering to produce multimodal clusters. The whole multimodal clusters are later annotated rather than just single tracks, which is done by propagation. Optical character recogni- tion systems provides initial annotation. Four different strategies, which select candidates for annotation, are tested. The initial results of annotation propagation are promising. With the use of a proper active learning selection strategy the human annotator involvement could be reduced even further

    QCompere @ REPERE 2013

    No full text
    International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed

    LIG at MediaEval 2015 Multimodal Person Discovery in Broadcast TV Task

    Get PDF
    ABSTRACT In this working notes paper the contribution of the LIG team (partnership between Univ. Grenoble Alpes and Ozyegin University) to the Multimodal Person Discovery in Broadcast TV task in MediaEval 2015 is presented. The task focused on unsupervised learning techniques. Two different approaches were submitted by the team. In the first one, new features for face and speech modalities were tested. In the second one, an alternative way to calculate the distance between face tracks and speech segments is presented. It also had a competitive MAP score and was able to beat the baseline

    Identificación no-supervisada de personas en programas de TV

    Get PDF
    The enormous amount of visual data generated nowadays creates a strong need for annotation tools to enable search and retrieval of information present in the videos. One of the most relevant information is the identity of people. The aim of this project is to implement non-supervised algorithms of text and face recognition, to identify relevant people appearing in Broadcast TV. This project achieves avoiding manual annotations with an automatic annotation system.La enorme cantidad de datos visuales generados hoy en día crea una fuerte necesidad de obtener técnicas de anotación para poder realizar búsquedas de información en los vídeos. Una de la información más relevante es la identidad de las personas. El objetivo de este proyecto es proponer unos algoritmos no supervisados de reconocimiento facial y de texto para la identificación de las personas en transmisiones de TV, para obtener un sistema de anotación de vídeo automático y así evitar las anotaciones manuales.La enorme quantitat de dades visuals que es genera avui en dia crea una forta necessitat de obtenir tècniques d’anotació per a poder realitzar cerques d’informació en els vídeos. Una de la informació més rellevant és la identitat de les persones. L’objectiu d’aquest projecte és proposar uns algorismes no supervisats de reconeixement facial i de text per a la identificació de les persones en les transmissions de TV, per obtenir un sistema d’anotació de vídeo automàtic, i evitar així les anotacions manuals

    EUMSSI team at the MediaEval Person Discovery Challenge 2016

    Get PDF
    We present the results of the EUMSSI team’s participation in the Multimodal Person Discovery task. The goal is to identify all people who simultaneously appear and speak in a video corpus. In the proposed system, besides improving each modality, we emphasize on the ranking of multiple results from both audio stream and visual stream

    UPC system for the 2015 MediaEval multimodal person discovery in broadcast TV task

    Get PDF
    This paper describes a system to identify people in broadcast TV shows in a purely unsupervised manner. The system outputs the identity of people that appear, talk and can be identified by using information appearing in the show (in our case, text with person names). Three types of monomodal technologies are used: speech diarization, video diarization and text detection / named entity recognition. These technologies are combined using a linear programming approach where some restrictions are imposed.Postprint (published version

    Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast

    Get PDF
    Poster Session: Speaker Recognition IIIInternational audienceWe propose an approach for unsupervised speaker identification in TV broadcast videos, by combining acoustic speaker diarization with person names obtained via video OCR from overlaid texts. Three methods for the propagation of the overlaid names to the speech turns are compared, taking into account the co-occurence duration between the speaker clusters and the names provided by the video OCR and using a task-adapted variant of the TF-IDF information retrieval coefficient. These methods were tested on the REPERE dry-run evaluation corpus, containing 3 hours of annotated videos. Our best unsupervised system reaches a F-measure of 70.2% when considering all the speakers, and 81.7% if anchor speakers are left out. By comparison, a mono-modal, supervised speaker identification system with 535 speaker models trained on matching development data and additional TV and radio data only provided a 57.5% F-measure when considering all the speakers and 45.7% without anchor

    Active Selection with Label Propagation for Minimizing Human Effort in Speaker Annotation of TV Shows

    Get PDF
    International audienceIn this paper an approach minimizing the human involvement in the manual annotation of speakers is presented. At each iter- ation a selection strategy choses the most suitable speech track for manual annotation, which is then associated with all the tracks in the cluster that contains it. The study makes use of a system that propagates the speaker track labels. This is done using a agglomerative clustering with constraints. Several dif- ferent unsupervised active learning selection strategies are eval- uated. Additionally, the presented approach can be used to ef- ficiently generate sets of speech tracks for training biometric models. In this case both the length of the speech track for a given person and its purity are taken into consideration. To evaluate the system the REPERE video corpus was used. Along with the speech tracks extracted from the videos, the op- tical character recognition system was adapted to extract names of potential speakers. This was then used as the 'cold start' for the selection method

    QCompere @ REPERE 2013

    Get PDF
    International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed

    Fuzzy-Based Segmentation for Variable Font-Sized Text Extraction from Images/Videos

    Get PDF
    Textual information embedded in multimedia can provide a vital tool for indexing and retrieval. A lot of work is done in the field of text localization and detection because of its very fundamental importance. One of the biggest challenges of text detection is to deal with variation in font sizes and image resolution. This problem gets elevated due to the undersegmentation or oversegmentation of the regions in an image. The paper addresses this problem by proposing a solution using novel fuzzy-based method. This paper advocates postprocessing segmentation method that can solve the problem of variation in text sizes and image resolution. The methodology is tested on ICDAR 2011 Robust Reading Challenge dataset which amply proves the strength of the recommended method
    corecore