3,119 research outputs found
Automatic detection and extraction of artificial text in video
A significant challenge in large multimedia databases is the
provision of efficient means for semantic indexing and retrieval of visual information. Artificial text in video is normally generated in order to supplement or summarise the visual content and thus is an important carrier of information that is highly relevant to the content of the video. As such, it is a potential ready-to-use source of semantic information. In this paper we present an algorithm for detection and localisation of artificial text in video using a horizontal difference magnitude measure and morphological processing. The result of character segmentation, based on a modified version of the Wolf-Jolion
algorithm [1][2] is enhanced using smoothing and multiple
binarisation. The output text is input to an âoff-the-shelfâ noncommercial OCR. Detection, localisation and recognition results for a 20min long MPEG-1 encoded television programme are presented
Super Imposed Method for Text Extraction in a Sports Video
Video is one of the sources for presenting the valuable information. It contains sequence of video images, audio and text information. Text data present in video contain useful information for automatic annotation, structuring, mining, indexing and retrieval of video. Nowadays mechanically added (superimposed) text in video sequences provides useful information about their contents. It provides supplemental but important information for video indexing and retrieval. A large number of techniques have been proposed to address this problem. This paper provides a novel method of detecting video text regions containing player information and score in sports videos. It also proposes an improved algorithm for the automatic extraction of super imposed text in sports video. First, we identified key frames from video using the Color Histogram technique to minimize the number of video frames. Then, the key images were converted into gray images for the efficient text detection. Generally, the super imposed text displayed in bottom part of the image in the sports video. So, we cropped the text image regions in the gray image which contains the text information. Then we applied the canny edge detection algorithms for text edge detection. The ESPN cricket video data was taken for our experiment and extracted the super imposed text region in the sports video. Using the OCR tool, the text region image was converted as ASCII text and the result was verified
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
This paper presents OmniDataComposer, an innovative approach for multimodal
data fusion and unlimited data generation with an intent to refine and
uncomplicate interplay among diverse data modalities. Coming to the core
breakthrough, it introduces a cohesive data structure proficient in processing
and merging multimodal data inputs, which include video, audio, and text. Our
crafted algorithm leverages advancements across multiple operations such as
video/image caption extraction, dense caption extraction, Automatic Speech
Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything
Model(RAM), and object tracking. OmniDataComposer is capable of identifying
over 6400 categories of objects, substantially broadening the spectrum of
visual information. It amalgamates these diverse modalities, promoting
reciprocal enhancement among modalities and facilitating cross-modal data
correction. \textbf{The final output metamorphoses each video input into an
elaborate sequential document}, virtually transmuting videos into thorough
narratives, making them easier to be processed by large language models. Future
prospects include optimizing datasets for each modality to encourage unlimited
data generation. This robust base will offer priceless insights to models like
ChatGPT, enabling them to create higher quality datasets for video captioning
and easing question-answering tasks based on video content. OmniDataComposer
inaugurates a new stage in multimodal learning, imparting enormous potential
for augmenting AI's understanding and generation of complex, real-world data
Recent Trends and Techniques in Text Detection and Text Localization in a Natural Scene: A Survey
Text information extraction from natural scene images is a rising area of research. Since text in natural scene images generally carries valuable details, detecting and recognizing scene text has been deemed essential for a variety of advanced computer vision applications. There has been a lot of effort put into extracting text regions from scene text images in an effective and reliable manner. As most text recognition applications have high demand of robust algorithms for detecting and localizing texts from a given scene text image, so the researchers mainly focus on the two important stages text detection and text localization. This paper provides a review of various techniques of text detection and text localization
ATLAS: Adaptive Text Localization Algorithm in High Color Similarity Background
One of the major problems that occur in text localization process is the issue of color similarity between text and background image. The limitation of localization algorithms due to high color similarity is highlighted in several research papers. Hence, this research focuses towards the improvement of text localizing capability in high color background image similarity by introducing an adaptive text localization algorithm (ATLAS). ATLAS is an edge-based text localization algorithm that consists of two parts. Text-Background Similarity Index (TBSI) being the first part of ATLAS, measures the similarity index of every text region while the second, Multi Adaptive Threshold (MAT), performs multiple adaptive thresholds calculation using size filtration and degree deviation for locating the possible text region. In this research, ATLAS is verified and compared with other localization techniques based on two parameters, localizing strength and precision. The experiment has been implemented and verified using two types of datasets, generated text color spectrum dataset and Document Analysis and Recognition dataset (ICDAR). The result shows ATLAS has significant improvement on localizing strength and slight improvement on precision compared with other localization algorithms in high color text-background image
- âŠ