    Обнаружение текстовых областей в видеопоследовательностях

    В статье рассматривается задача обнаружения текстовых областей на неоднородном фоне в видео-последовательностях. Предлагается двухэтапная схема, алгоритм и методика обнаружения текстовых областей с использованием непрерывного вейвлет-преобразования с автоматическим выбором масштаба или итерационной обработкой с разными масштабами.У статті розглядається задача виявлення текстових областей на неоднорідному фоні в відеопослідовності. Пропонується двоетапна схема, алгоритм і методика виявлення текстових областей з використанням безперервного вейвлет-перетворення з автоматичним вибором масштабу або ітераційною обробкою з різними масштабами.In the article, the problem of text region detection on the non-uniform background in video frames is considered. The two-stage scheme, the algorithm and the technique of detection of text regions using a continuous wavelet transform with automatic scale selection or iteration processing at different scales are poposed

    Automatic propagation of manual annotations for multimodal person identification in TV shows

    International audienceIn this paper an approach to human annotation propagation for person identification in the multimodal context is proposed. A system is used, which combines speaker diarization and face clustering to produce multimodal clusters. The whole multimodal clusters are later annotated rather than just single tracks, which is done by propagation. Optical character recogni- tion systems provides initial annotation. Four different strategies, which select candidates for annotation, are tested. The initial results of annotation propagation are promising. With the use of a proper active learning selection strategy the human annotator involvement could be reduced even further

    Active Selection with Label Propagation for Minimizing Human Effort in Speaker Annotation of TV Shows

    International audienceIn this paper an approach minimizing the human involvement in the manual annotation of speakers is presented. At each iter- ation a selection strategy choses the most suitable speech track for manual annotation, which is then associated with all the tracks in the cluster that contains it. The study makes use of a system that propagates the speaker track labels. This is done using a agglomerative clustering with constraints. Several dif- ferent unsupervised active learning selection strategies are eval- uated. Additionally, the presented approach can be used to ef- ficiently generate sets of speech tracks for training biometric models. In this case both the length of the speech track for a given person and its purity are taken into consideration. To evaluate the system the REPERE video corpus was used. Along with the speech tracks extracted from the videos, the op- tical character recognition system was adapted to extract names of potential speakers. This was then used as the 'cold start' for the selection method

    Fuzzy-Based Segmentation for Variable Font-Sized Text Extraction from Images/Videos

    Textual information embedded in multimedia can provide a vital tool for indexing and retrieval. A lot of work is done in the field of text localization and detection because of its very fundamental importance. One of the biggest challenges of text detection is to deal with variation in font sizes and image resolution. This problem gets elevated due to the undersegmentation or oversegmentation of the regions in an image. The paper addresses this problem by proposing a solution using novel fuzzy-based method. This paper advocates postprocessing segmentation method that can solve the problem of variation in text sizes and image resolution. The methodology is tested on ICDAR 2011 Robust Reading Challenge dataset which amply proves the strength of the recommended method