987 research outputs found
Indexing, browsing and searching of digital video
Video is a communications medium that normally brings together moving pictures with a synchronised audio track into a discrete piece or pieces of information. The size of a āpiece ā of video can variously be referred to as a frame, a shot, a scene, a clip, a programme or an episode, and these are distinguished by their lengths and by their composition. We shall return to the definition of each of these in section 4 this chapter. In modern society, video is ver
OCR for TIFF Compressed Document Images Directly in Compressed Domain Using Text segmentation and Hidden Markov Model
In today's technological era, document images play an important and integral
part in our day to day life, and specifically with the surge of Covid-19,
digitally scanned documents have become key source of communication, thus
avoiding any sort of infection through physical contact. Storage and
transmission of scanned document images is a very memory intensive task, hence
compression techniques are being used to reduce the image size before archival
and transmission. To extract information or to operate on the compressed
images, we have two ways of doing it. The first way is to decompress the image
and operate on it and subsequently compress it again for the efficiency of
storage and transmission. The other way is to use the characteristics of the
underlying compression algorithm to directly process the images in their
compressed form without involving decompression and re-compression. In this
paper, we propose a novel idea of developing an OCR for CCITT (The
International Telegraph and Telephone Consultative Committee) compressed
machine printed TIFF document images directly in the compressed domain. After
segmenting text regions into lines and words, HMM is applied for recognition
using three coding modes of CCITT- horizontal, vertical and the pass mode.
Experimental results show that OCR on pass modes give a promising results.Comment: The paper has 14 figures and 1 tabl
Visual Representation of Text in Web Documents and Its Interpretation
This paper examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described
Visual Representation of Text in Web Documents and Its Interpretation
This paper examines the uses of text and its representation on Web documents in terms of the challenges in its interpretation. Particular attention is paid to the significant problem of non-uniform representation of text. This non-uniformity is mainly due to the presence of semantically important text in image form as opposed to the standard encoded text. The issues surrounding text representation in Web documents are discussed in the context of colour perception and spatial representation. The characteristics of the representation of text in image form are examined and research towards interpreting these images of text is briefly described
T2CI-GAN: Text to Compressed Image generation using Generative Adversarial Network
The problem of generating textual descriptions for the visual data has gained
research attention in the recent years. In contrast to that the problem of
generating visual data from textual descriptions is still very challenging,
because it requires the combination of both Natural Language Processing (NLP)
and Computer Vision techniques. The existing methods utilize the Generative
Adversarial Networks (GANs) and generate the uncompressed images from textual
description. However, in practice, most of the visual data are processed and
transmitted in the compressed representation. Hence, the proposed work attempts
to generate the visual data directly in the compressed representation form
using Deep Convolutional GANs (DCGANs) to achieve the storage and computational
efficiency. We propose GAN models for compressed image generation from text.
The first model is directly trained with JPEG compressed DCT images (compressed
domain) to generate the compressed images from text descriptions. The second
model is trained with RGB images (pixel domain) to generate JPEG compressed DCT
representation from text descriptions. The proposed models are tested on an
open source benchmark dataset Oxford-102 Flower images using both RGB and JPEG
compressed versions, and accomplished the state-of-the-art performance in the
JPEG compressed domain. The code will be publicly released at GitHub after
acceptance of paper.Comment: Accepted for publication at IAPR's 6th CVIP 202
Information extraction from multimedia web documents: an open-source platform and testbed
The LivingKnowledge project aimed to enhance the current state of the art in search, retrieval and knowledge management on the web by advancing the use of sentiment and opinion analysis within multimedia applications. To achieve this aim, a diverse set of novel and complementary analysis techniques have been integrated into a single, but extensible software platform on which such applications can be built. The platform combines state-of-the-art techniques for extracting facts, opinions and sentiment from multimedia documents, and unlike earlier platforms, it exploits both visual and textual techniques to support multimedia information retrieval. Foreseeing the usefulness of this software in the wider community, the platform has been made generally available as an open-source project. This paper describes the platform design, gives an overview of the analysis algorithms integrated into the system and describes two applications that utilise the system for multimedia information retrieval
- ā¦