5,002 research outputs found
Zone-based Keyword Spotting in Bangla and Devanagari Documents
In this paper we present a word spotting system in text lines for offline
Indic scripts such as Bangla (Bengali) and Devanagari. Recently, it was shown
that zone-wise recognition method improves the word recognition performance
than conventional full word recognition system in Indic scripts. Inspired with
this idea we consider the zone segmentation approach and use middle zone
information to improve the traditional word spotting performance. To avoid the
problem of zone segmentation using heuristic approach, we propose here an HMM
based approach to segment the upper and lower zone components from the text
line images. The candidate keywords are searched from a line without segmenting
characters or words. Also, we propose a novel feature combining foreground and
background information of text line images for keyword-spotting by character
filler models. A significant improvement in performance is noted by using both
foreground and background information than their individual one. Pyramid
Histogram of Oriented Gradient (PHOG) feature has been used in our word
spotting framework. From the experiment, it has been noted that the proposed
zone-segmentation based system outperforms traditional approaches of word
spotting.Comment: Preprint Submitte
Text line Segmentation in Compressed Representation of Handwritten Document using Tunneling Algorithm
In this research work, we perform text line segmentation directly in
compressed representation of an unconstrained handwritten document image. In
this relation, we make use of text line terminal points which is the current
state-of-the-art. The terminal points spotted along both margins (left and
right) of a document image for every text line are considered as source and
target respectively. The tunneling algorithm uses a single agent (or robot) to
identify the coordinate positions in the compressed representation to perform
text-line segmentation of the document. The agent starts at a source point and
progressively tunnels a path routing in between two adjacent text lines and
reaches the probable target. The agent's navigation path from source to the
target bypassing obstacles, if any, results in segregating the two adjacent
text lines. However, the target point would be known only when the agent
reaches the destination; this is applicable for all source points and
henceforth we could analyze the correspondence between source and target nodes.
Artificial Intelligence in Expert systems, dynamic programming and greedy
strategies are employed for every search space while tunneling. An exhaustive
experimentation is carried out on various benchmark datasets including ICDAR13
and the performances are reported.Comment: Compressed Representation, Handwritten Document Image, Text-Line
Terminal Point, Text-Line Segmentation, Search Space, Gri
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Recognizing text in the wild is a really challenging task because of complex
backgrounds, various illuminations and diverse distortions, even with deep
neural networks (convolutional neural networks and recurrent neural networks).
In the end-to-end training procedure for scene text recognition, the outputs of
deep neural networks at different iterations are always demonstrated with
diversity and complementarity for the target object (text). Here, a simple but
effective deep learning method, an adaptive ensemble of deep neural networks
(AdaDNNs), is proposed to simply select and adaptively combine classifier
components at different iterations from the whole learning system. Furthermore,
the ensemble is formulated as a Bayesian framework for classifier weighting and
combination. A variety of experiments on several typical acknowledged
benchmarks, i.e., ICDAR Robust Reading Competition (Challenge 1, 2 and 4)
datasets, verify the surprised improvement from the baseline DNNs, and the
effectiveness of AdaDNNs compared with the recent state-of-the-art methods
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
Direct Processing of Document Images in Compressed Domain
With the rapid increase in the volume of Big data of this digital era, fax
documents, invoices, receipts, etc are traditionally subjected to compression
for the efficiency of data storage and transfer. However, in order to process
these documents, they need to undergo the stage of decompression which indents
additional computing resources. This limitation induces the motivation to
research on the possibility of directly processing of compressed images. In
this research paper, we summarize the research work carried out to perform
different operations straight from run-length compressed documents without
going through the stage of decompression. The different operations demonstrated
are feature extraction; text-line, word and character segmentation; document
block segmentation; and font size detection, all carried out in the compressed
version of the document. Feature extraction methods demonstrate how to extract
the conventionally defined features such as projection profile, run-histogram
and entropy, directly from the compressed document data. Document segmentation
involves the extraction of compressed segments of text-lines, words and
characters using the vertical and horizontal projection profile features.
Further an attempt is made to segment randomly a block of interest from the
compressed document and subsequently facilitate absolute and relative
characterization of the segmented block which finds real time applications in
automatic processing of Bank Cheques, Challans, etc, in compressed domain.
Finally an application to detect font size at text line level is also
investigated. All the proposed algorithms are validated experimentally with
sufficient data set of compressed documents.Comment: 2014 Fourth IDRBT Doctoral Colloquium, December 11-12, 2014
Hyderabad, Indi
Semantic speech retrieval with a visually grounded model of untranscribed speech
There is growing interest in models that can learn from unlabelled speech
paired with visual context. This setting is relevant for low-resource speech
processing, robotics, and human language acquisition research. Here we study
how a visually grounded speech model, trained on images of scenes paired with
spoken captions, captures aspects of semantics. We use an external image tagger
to generate soft text labels from images, which serve as targets for a neural
model that maps untranscribed speech to (semantic) keyword labels. We introduce
a newly collected data set of human semantic relevance judgements and an
associated task, semantic speech retrieval, where the goal is to search for
spoken utterances that are semantically relevant to a given text query. Without
seeing any text, the model trained on parallel speech and images achieves a
precision of almost 60% on its top ten semantic retrievals. Compared to a
supervised model trained on transcriptions, our model matches human judgements
better by some measures, especially in retrieving non-verbatim semantic
matches. We perform an extensive analysis of the model and its resulting
representations.Comment: 10 pages, 3 figures, 5 tables; accepted to the IEEE/ACM Transactions
on Audio, Speech and Language Processin
Extraction of Projection Profile, Run-Histogram and Entropy Features Straight from Run-Length Compressed Text-Documents
Document Image Analysis, like any Digital Image Analysis requires
identification and extraction of proper features, which are generally extracted
from uncompressed images, though in reality images are made available in
compressed form for the reasons such as transmission and storage efficiency.
However, this implies that the compressed image should be decompressed, which
indents additional computing resources. This limitation induces the motivation
to research in extracting features directly from the compressed image. In this
research, we propose to extract essential features such as projection profile,
run-histogram and entropy for text document analysis directly from run-length
compressed text-documents. The experimentation illustrates that features are
extracted directly from the compressed image without going through the stage of
decompression, because of which the computing time is reduced. The feature
values so extracted are exactly identical to those extracted from uncompressed
images.Comment: Published by IEEE in Proceedings of ACPR-2013. arXiv admin note: text
overlap with arXiv:1403.778
Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network
In this paper, we propose a novel approach of word-level Indic script
identification using only character-level data in training stage. The
advantages of using character level data for training have been outlined in
section I. Our method uses a multimodal deep network which takes both offline
and online modality of the data as input in order to explore the information
from both the modalities jointly for script identification task. We take
handwritten data in either modality as input and the opposite modality is
generated through intermodality conversion. Thereafter, we feed this
offline-online modality pair to our network. Hence, along with the advantage of
utilizing information from both the modalities, it can work as a single
framework for both offline and online script identification simultaneously
which alleviates the need for designing two separate script identification
modules for individual modality. One more major contribution is that we propose
a novel conditional multimodal fusion scheme to combine the information from
offline and online modality which takes into account the real origin of the
data being fed to our network and thus it combines adaptively. An exhaustive
experiment has been done on a data set consisting of English and six Indic
scripts. Our proposed framework clearly outperforms different frameworks based
on traditional classifiers along with handcrafted features and deep learning
based methods with a clear margin. Extensive experiments show that using only
character level training data can achieve state-of-art performance similar to
that obtained with traditional training using word level data in our framework.Comment: Accepted in Information Fusion, Elsevie
A line-based representation for matching words in historical manuscripts
Cataloged from PDF version of article.In this study, we propose a new method for retrieving and recognizing words in historical documents. We represent word images with a set of line segments. Then we provide a criterion for word matching based on matching the lines. We carry out experiments on a benchmark dataset consisting of manuscripts by George Washington, as well as on Ottoman manuscripts. (C) 2011 Elsevier B.V. All rights reserved
TexT - Text Extractor Tool for Handwritten Document Transcription and Annotation
This paper presents a framework for semi-automatic transcription of
large-scale historical handwritten documents and proposes a simple
user-friendly text extractor tool, TexT for transcription. The proposed
approach provides a quick and easy transcription of text using computer
assisted interactive technique. The algorithm finds multiple occurrences of the
marked text on-the-fly using a word spotting system. TexT is also capable of
performing on-the-fly annotation of handwritten text with automatic generation
of ground truth labels, and dynamic adjustment and correction of user generated
bounding box annotations with the word being perfectly encapsulated. The user
can view the document and the found words in the original form or with
background noise removed for easier visualization of transcription results. The
effectiveness of TexT is demonstrated on an archival manuscript collection from
well-known publicly available dataset
- …