2 research outputs found

    Zone Segmentation and Thinning based Algorithm for Segmentation of Devnagari Text

    Get PDF
    Character segmentation of handwritten documents is an challenging research topic due to its diverse application environment.OCR can be used for automated processing and handling of forms, old corrupted reports, bank cheques, postal codes and structures. Now Segmentation of a word into characters is one of the major challenge in optical character recognition. This is even more challenging when we segment characters in an offline handwritten document and the next hurdle is presence of broken ,touching and overlapped characters in devnagari script. So, in this paper we have introduced an algorithm that will segment both broken as well as touching characters in devnagari script. Now to segment these characters the algorithm uses both zone segmentation and thinning based techniques. We have used 85 words each for isolated, broken, touching and both broken as well as touching characters individually. Results achieved while segmentation of broken as well as touching are 96.2 % on an average

    An Integrated Segmentation and Recognition Algorithm for Text in Video

    Get PDF
    视频文本行图像识别的技术难点主要来源于两个方面:1)粘连字符的切分与识别问题;2)复杂背景中字符的切分与识别问题.为了能够同时切分和识别这两种情况中的字符,提出了一种集成型的字符切分与识别算法.该集成型算法首先对文本行图像二值化,基于二值化的文本行图像的水平投影估计文本行高度.其次根据字符笔划粘连的程度,基于图像分析或字符识别对二值图像中的宽连通域进行切分.然后基于字符识别组合连通域得到候选识别结果,最后根据候选识别结果构造词图,基于语言模型从词图中选出字符识别结果.实验表明该集成型算法大大降低了粘连字符及复杂背景中字符的识别错误率.There are two difficulties to recognize the text images which are extracted from videos: 1) how to segment and recognize the merged characters; 2) how to segment and recognize the characters with complex backgrounds.To overcome the difficulties, a novel integrated segmentation and recognition method is proposed.The method first binarizes the text image and estimates the height of the text line.Second, the connected components in the binary text image, which are wider than a threshold, are segmented based on image analysis or character recognition.Third, the connected components are selected and combined to generate the character patterns based on character recognition.Last, the best character sequence is selected based on a statistical language model.Experimental results demonstrate the effectiveness of the proposed method
    corecore