220 research outputs found
A Framework for Devanagari Script-based Captcha
Human Interactive Proofs (HIPs) are automatic reverse Turing tests designed
to distinguish between various groups of users. Completely Automatic Public
Turing test to tell Computers and Humans Apart (CAPTCHA) is a HIP system that
distinguish between humans and malicious computer programs. Many CAPTCHAs have
been proposed in the literature that text-graphical based, audio-based,
puzzle-based and mathematical questions-based. The design and implementation of
CAPTCHAs fall in the realm of Artificial Intelligence. We aim to utilize
CAPTCHAs as a tool to improve the security of Internet based applications. In
this paper we present a framework for a text-based CAPTCHA based on Devanagari
script which can exploit the difference in the reading proficiency between
humans and computer programs. Our selection of Devanagari script-based CAPTCHA
is based on the fact that it is used by a large number of Indian languages
including Hindi which is the third most spoken language. There is potential for
an exponential rise in the applications that are likely to be developed in that
script thereby making it easy to secure Indian language based applications.Comment: 10 pages, 8 Figures, CCSEA 2011 - First International Conference,
Chennai, July 15-17, 201
Word Searching in Scene Image and Video Frame in Multi-Script Scenario using Dynamic Shape Coding
Retrieval of text information from natural scene images and video frames is a
challenging task due to its inherent problems like complex character shapes,
low resolution, background noise, etc. Available OCR systems often fail to
retrieve such information in scene/video frames. Keyword spotting, an
alternative way to retrieve information, performs efficient text searching in
such scenarios. However, current word spotting techniques in scene/video images
are script-specific and they are mainly developed for Latin script. This paper
presents a novel word spotting framework using dynamic shape coding for text
retrieval in natural scene image and video frames. The framework is designed to
search query keyword from multiple scripts with the help of on-the-fly
script-wise keyword generation for the corresponding script. We have used a
two-stage word spotting approach using Hidden Markov Model (HMM) to detect the
translated keyword in a given text line by identifying the script of the line.
A novel unsupervised dynamic shape coding based scheme has been used to group
similar shape characters to avoid confusion and to improve text alignment.
Next, the hypotheses locations are verified to improve retrieval performance.
To evaluate the proposed system for searching keyword from natural scene image
and video frames, we have considered two popular Indic scripts such as Bangla
(Bengali) and Devanagari along with English. Inspired by the zone-wise
recognition approach in Indic scripts[1], zone-wise text information has been
used to improve the traditional word spotting performance in Indic scripts. For
our experiment, a dataset consisting of images of different scenes and video
frames of English, Bangla and Devanagari scripts were considered. The results
obtained showed the effectiveness of our proposed word spotting approach.Comment: Multimedia Tools and Applications, Springe
A Technique for Character Segmentation in Middle zone of Handwritten Hindi words using Hybrid Approach
India is a country where people talk in multilingual and write in multi-script. Devanagari is one of the most popular scripts in India, which is used to write Hindi, Sanskrit, Sindhi, Marathi and Nepali Languages. This research work is performed on Hindi language. A large number of precious and essential documents are available in handwritten form, which needs to be converted into editable form. The existence of Optical Character Recognition (OCR) makes this task easier to convert handwritten text in editable form. Character segmentation is an important phase of OCR, which segment the characters from handwritten words. This enhances the accuracy of OCR system. In this paper a hybrid approach is used to segment the characters that contain single and multiple touching characters within a word. The proposed system is tested on a dataset of various handwritten words written by different writers. The dataset of proposed system contains more than 300 handwritten words in Hindi language. Accuracy of the proposed hybrid system is evaluated to 96% which is better than that of existing techniques
- …