Search CORE

182 research outputs found

Text Extraction in Video

Author: Dhanashri Holgare, Rutuja Talewar, Prof. Karishma Dhule
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/03/2018
Field of study

The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization of visual information. The current state of the art for extracting text from video either makes simplistic assumptions as to the nature of the text to be found, or restricts itself to a subclass of the wide variety of text that can occur in broadcast video. Most published methods only work on artificial text (captions) that is composited on the video frame. Also, these methods have been developed for extracting text from images that have been applied to video frames. They do not use the additional temporal information in video to good effect.This thesis presents a reliable system for detecting, localizing, extracting, tracking and binarizing text from unconstrained, general-purpose video. In developing methods for extraction of text from video it was observed that no single algorithm could detect all forms of text. The strategy is to have a multi-pronged approach to the problem, one that involves multiple methods, and algorithms operating in functional parallelism. The system utilizes the temporal information available in video. The system can operate on JPEG images, MPEG-1 bit streams, as well as live video feeds. It is also possible to operate the methods individually and independently

International Journal on Future Revolution in Computer Science & Communication Engineering

ICDAR 2023 Video Text Reading Competition for Dense and Small Text

Author: Bai Xiang
Karatzas Dimosthenis
Li Jiahong
Li Zhuang
Pal Umapada
Shou Mike Zheng
Wu Weijia
Zhao Yuzhong
Publication venue
Publication date: 10/04/2023
Field of study

Recently, video text detection, tracking, and recognition in natural scenes are becoming very popular in the computer vision community. However, most existing algorithms and benchmarks focus on common text cases (e.g., normal size, density) and single scenarios, while ignoring extreme video text challenges, i.e., dense and small text in various scenarios. In this competition report, we establish a video text reading benchmark, DSText, which focuses on dense and small text reading challenges in the video with various scenarios. Compared with the previous datasets, the proposed dataset mainly include three new challenges: 1) Dense video texts, a new challenge for video text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g., Game, sports, etc. The proposed DSText includes 100 video clips from 12 open scenarios, supporting two tasks (i.e., video text tracking (Task 1) and end-to-end video text spotting (Task 2)). During the competition period (opened on 15th February 2023 and closed on 20th March 2023), a total of 24 teams participated in the three proposed tasks with around 30 valid submissions, respectively. In this article, we describe detailed statistical information of the dataset, tasks, evaluation protocols and the results summaries of the ICDAR 2023 on DSText competition. Moreover, we hope the benchmark will promise video text research in the community

arXiv.org e-Print Archive

Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

Author: A Dempster
C Garcia
Christophe Garcia
D Chen
Franck Mamalet
H Li
J Lim
J Weinman
K Jung
Khaoula Elagouni
L Bahl
M Li
Pascale Sébillot
Q Ye
R Casey
R Yager
S Lucas
T Sato
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2014
Field of study

International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

HAL-Rennes 1

Satellite fixed communications service: A forecast of potential domestic demand through the year 2000. Volume 3: Appendices

Author: Al-Kinani G.
Bhushan C.
Bowyer J.
Kaushal D.
Kratochvil D.
Steinnagel K.
Publication venue
Publication date
Field of study

Voice applications, data applications, video applications, impacted baseline forecasts, market distribution model, net long haul forecasts, trunking earth station definition and costs, trunking space segment cost, trunking entrance/exit links, trunking network costs and crossover distances with terrestrial tariffs, net addressable forecasts, capacity requirements, improving spectrum utilization, satellite system market development, and the 30/20 net accessible market are considered

NASA Technical Reports Server

Smart Video Text: An Intelligent Video Database System

Author: Elmagarmid Ahmed K.
Houstis Elias N.
Jiang H.
Kokkoras F.
Vlahavas I.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/10/1997
Field of study

Purdue E-Pubs

A hierarchical multi-modal approach to story segmentation in news video

Author: LEKHA CHAISORN
Publication venue
Publication date: 30/05/2005
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS