182 research outputs found
Text Extraction in Video
The detection and extraction of scene and caption text from unconstrained, general purpose video is an important research problem in the context of content-based retrieval and summarization of visual information. The current state of the art for extracting text from video either makes simplistic assumptions as to the nature of the text to be found, or restricts itself to a subclass of the wide variety of text that can occur in broadcast video. Most published methods only work on artificial text (captions) that is composited on the video frame. Also, these methods have been developed for extracting text from images that have been applied to video frames. They do not use the additional temporal information in video to good effect.This thesis presents a reliable system for detecting, localizing, extracting, tracking and binarizing text from unconstrained, general-purpose video. In developing methods for extraction of text from video it was observed that no single algorithm could detect all forms of text. The strategy is to have a multi-pronged approach to the problem, one that involves multiple methods, and algorithms operating in functional parallelism. The system utilizes the temporal information available in video. The system can operate on JPEG images, MPEG-1 bit streams, as well as live video feeds. It is also possible to operate the methods individually and independently
ICDAR 2023 Video Text Reading Competition for Dense and Small Text
Recently, video text detection, tracking, and recognition in natural scenes
are becoming very popular in the computer vision community. However, most
existing algorithms and benchmarks focus on common text cases (e.g., normal
size, density) and single scenarios, while ignoring extreme video text
challenges, i.e., dense and small text in various scenarios. In this
competition report, we establish a video text reading benchmark, DSText, which
focuses on dense and small text reading challenges in the video with various
scenarios. Compared with the previous datasets, the proposed dataset mainly
include three new challenges: 1) Dense video texts, a new challenge for video
text spotter. 2) High-proportioned small texts. 3) Various new scenarios, e.g.,
Game, sports, etc. The proposed DSText includes 100 video clips from 12 open
scenarios, supporting two tasks (i.e., video text tracking (Task 1) and
end-to-end video text spotting (Task 2)). During the competition period (opened
on 15th February 2023 and closed on 20th March 2023), a total of 24 teams
participated in the three proposed tasks with around 30 valid submissions,
respectively. In this article, we describe detailed statistical information of
the dataset, tasks, evaluation protocols and the results summaries of the ICDAR
2023 on DSText competition. Moreover, we hope the benchmark will promise video
text research in the community
Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation
International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods
Satellite fixed communications service: A forecast of potential domestic demand through the year 2000. Volume 3: Appendices
Voice applications, data applications, video applications, impacted baseline forecasts, market distribution model, net long haul forecasts, trunking earth station definition and costs, trunking space segment cost, trunking entrance/exit links, trunking network costs and crossover distances with terrestrial tariffs, net addressable forecasts, capacity requirements, improving spectrum utilization, satellite system market development, and the 30/20 net accessible market are considered
A hierarchical multi-modal approach to story segmentation in news video
Ph.DDOCTOR OF PHILOSOPH
- …