Search CORE

3,119 research outputs found

Automatic detection and extraction of artificial text in video

Author: Malobabić Jovanka
Marlow Seán
Murphy Noel
O'Connor Noel E.
Publication venue
Publication date: 01/04/2004
Field of study

A significant challenge in large multimedia databases is the provision of efficient means for semantic indexing and retrieval of visual information. Artificial text in video is normally generated in order to supplement or summarise the visual content and thus is an important carrier of information that is highly relevant to the content of the video. As such, it is a potential ready-to-use source of semantic information. In this paper we present an algorithm for detection and localisation of artificial text in video using a horizontal difference magnitude measure and morphological processing. The result of character segmentation, based on a modified version of the Wolf-Jolion algorithm [1][2] is enhanced using smoothing and multiple binarisation. The output text is input to an “off-the-shelf” noncommercial OCR. Detection, localisation and recognition results for a 20min long MPEG-1 encoded television programme are presented

Irish Universities

DCU Online Research Access Service

Super Imposed Method for Text Extraction in a Sports Video

Author: Tabish Sayyed, Dinesh Barai, Snehal Kande
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/03/2018
Field of study

Video is one of the sources for presenting the valuable information. It contains sequence of video images, audio and text information. Text data present in video contain useful information for automatic annotation, structuring, mining, indexing and retrieval of video. Nowadays mechanically added (superimposed) text in video sequences provides useful information about their contents. It provides supplemental but important information for video indexing and retrieval. A large number of techniques have been proposed to address this problem. This paper provides a novel method of detecting video text regions containing player information and score in sports videos. It also proposes an improved algorithm for the automatic extraction of super imposed text in sports video. First, we identified key frames from video using the Color Histogram technique to minimize the number of video frames. Then, the key images were converted into gray images for the efficient text detection. Generally, the super imposed text displayed in bottom part of the image in the sports video. So, we cropped the text image regions in the gray image which contains the text information. Then we applied the canny edge detection algorithms for text edge detection. The ESPN cricket video data was taken for our experiment and extracted the super imposed text region in the sports video. Using the OCR tool, the text region image was converted as ASCII text and the result was verified

International Journal on Future Revolution in Computer Science & Communication Engineering

OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation

Author: An Wangpeng
Fang Yuan
Wang Shihao
Yu Dongyang
Publication venue
Publication date: 08/08/2023
Field of study

This paper presents OmniDataComposer, an innovative approach for multimodal data fusion and unlimited data generation with an intent to refine and uncomplicate interplay among diverse data modalities. Coming to the core breakthrough, it introduces a cohesive data structure proficient in processing and merging multimodal data inputs, which include video, audio, and text. Our crafted algorithm leverages advancements across multiple operations such as video/image caption extraction, dense caption extraction, Automatic Speech Recognition (ASR), Optical Character Recognition (OCR), Recognize Anything Model(RAM), and object tracking. OmniDataComposer is capable of identifying over 6400 categories of objects, substantially broadening the spectrum of visual information. It amalgamates these diverse modalities, promoting reciprocal enhancement among modalities and facilitating cross-modal data correction. \textbf{The final output metamorphoses each video input into an elaborate sequential document}, virtually transmuting videos into thorough narratives, making them easier to be processed by large language models. Future prospects include optimizing datasets for each modality to encourage unlimited data generation. This robust base will offer priceless insights to models like ChatGPT, enabling them to create higher quality datasets for video captioning and easing question-answering tasks based on video content. OmniDataComposer inaugurates a new stage in multimodal learning, imparting enormous potential for augmenting AI's understanding and generation of complex, real-world data

arXiv.org e-Print Archive

Recent Trends and Techniques in Text Detection and Text Localization in a Natural Scene: A Survey

Author: Das Pranab
Prasad Vijay
Publication venue: Assam Don Bosco University
Publication date: 30/06/2021
Field of study

Text information extraction from natural scene images is a rising area of research. Since text in natural scene images generally carries valuable details, detecting and recognizing scene text has been deemed essential for a variety of advanced computer vision applications. There has been a lot of effort put into extracting text regions from scene text images in an effective and reliable manner. As most text recognition applications have high demand of robust algorithms for detecting and localizing texts from a given scene text image, so the researchers mainly focus on the two important stages text detection and text localization. This paper provides a review of various techniques of text detection and text localization

Assam Don Bosco University Journals

A Novel Method for Movie Character Identification Based on Graph Matching A Survey

Author: Mr. B. S. Salve, Prof. S. A. Shinde
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2014
Field of study

International Journal on Recent and Innovation Trends in Computing and Communication

Recognition of Characters from Streaming Videos

Author: Aniruddha Sinha
Arpan Pal
Tanushyam Chattopadhyay
Publication venue: 'IntechOpen'
Publication date: 17/08/2010
Field of study

Non

IntechOpen

ATLAS: Adaptive Text Localization Algorithm in High Color Similarity Background

Author: Idris Mohd Yazid
Wong LihFong
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/09/2015
Field of study

One of the major problems that occur in text localization process is the issue of color similarity between text and background image. The limitation of localization algorithms due to high color similarity is highlighted in several research papers. Hence, this research focuses towards the improvement of text localizing capability in high color background image similarity by introducing an adaptive text localization algorithm (ATLAS). ATLAS is an edge-based text localization algorithm that consists of two parts. Text-Background Similarity Index (TBSI) being the first part of ATLAS, measures the similarity index of every text region while the second, Multi Adaptive Threshold (MAT), performs multiple adaptive thresholds calculation using size filtration and degree deviation for locating the possible text region. In this research, ATLAS is verified and compared with other localization techniques based on two parameters, localizing strength and precision. The experiment has been implemented and verified using two types of datasets, generated text color spectrum dataset and Document Analysis and Recognition dataset (ICDAR). The result shows ATLAS has significant improvement on localizing strength and slight improvement on precision compared with other localization algorithms in high color text-background image

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System