562 research outputs found

    Zone Segmentation and Thinning based Algorithm for Segmentation of Devnagari Text

    Get PDF
    Character segmentation of handwritten documents is an challenging research topic due to its diverse application environment.OCR can be used for automated processing and handling of forms, old corrupted reports, bank cheques, postal codes and structures. Now Segmentation of a word into characters is one of the major challenge in optical character recognition. This is even more challenging when we segment characters in an offline handwritten document and the next hurdle is presence of broken ,touching and overlapped characters in devnagari script. So, in this paper we have introduced an algorithm that will segment both broken as well as touching characters in devnagari script. Now to segment these characters the algorithm uses both zone segmentation and thinning based techniques. We have used 85 words each for isolated, broken, touching and both broken as well as touching characters individually. Results achieved while segmentation of broken as well as touching are 96.2 % on an average

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF

    Off-line Thai handwriting recognition in legal amount

    Get PDF
    Thai handwriting in legal amounts is a challenging problem and a new field in the area of handwriting recognition research. The focus of this thesis is to implement Thai handwriting recognition system. A preliminary data set of Thai handwriting in legal amounts is designed. The samples in the data set are characters and words of the Thai legal amounts and a set of legal amounts phrases collected from a number of native Thai volunteers. At the preprocessing and recognition process, techniques are introduced to improve the characters recognition rates. The characters are divided into two smaller subgroups by their writing levels named body and high groups. The recognition rates of both groups are increased based on their distinguished features. The writing level separation algorithms are implemented using the size and position of characters. Empirical experiments are set to test the best combination of the feature to increase the recognition rates. Traditional recognition systems are modified to give the accumulative top-3 ranked answers to cover the possible character classes. At the postprocessing process level, the lexicon matching algorithms are implemented to match the ranked characters with the legal amount words. These matched words are joined together to form possible choices of amounts. These amounts will have their syntax checked in the last stage. Several syntax violations are caused by consequence faulty character segmentation and recognition resulting from connecting or broken characters. The anomaly in handwriting caused by these characters are mainly detected by their size and shape. During the recovery process, the possible word boundary patterns can be pre-defined and used to segment the hypothesis words. These words are identified by the word recognition and the results are joined with previously matched words to form the full amounts and checked by the syntax rules again. From 154 amounts written by 10 writers, the rejection rate is 14.9 percent with the recovery processes. The recognition rate for the accepted amount is 100 percent

    Research and Development of Feature Extraction from Myanmar Palm Leaf Manuscripts for the Myanmar Character Recognition System

    Get PDF
    This paper proposed Myanmar palm leaf manuscript handwriting OCR system. Each text area in the Myanmar palm-leaf manuscript is segmented. This segmented character text image is needed to be recognized to transform to Myanmar handwritten characters which express Myanmar’s precious historical and invaluable information. This paper involves two essential steps: preprocessing and feature extraction. The preprocessing is carried out to extract the attractive palm-leaf manuscript region from the Images automatically are taken by the camera and to support the enhanced images for subsequence processes of Myanmar character recognition from Myanmar palm leaves. The one-dimensional segmentation approach is used to crop leaf area in the image which is taken with high resolution. Line count analysis is also done to extract the region for using enough line count. After that, line segmentation is carried out using Object Frequency Histogram along the horizontal lines which can find the best optimal points between the lines. Similarly, the same technique but vertically is used to get each character or smallest group of characters. Totally 18 features are extracted to recognize the Myanmar palm-leaf manuscript characters. Although the experimental results are good enough but some difficulties are still needed to take account related to the connected components.

    “Indicating a research gap” in computer science research article introductions by non-native English writers

    Get PDF
    Writing research article to a peer reviewed publication is a complex process and involves daunting communication with the referees, co-authors and editors.Publication writing is more challenging than the usual communicative expression yet; prolific writers often pass through the processes without too much difficulty. Prolific writers draw on many writing strategies and one of the strategies is by highlighting the research gap.Most of the time, the research gaps highlighted are those related to the intended research niche and the intended study.While the strategy has been used and taught in many research writing instances, the strategy has been reported to be unpopular amongst the non-native English research writers. Although many non-native English writers are aware of the importance of the research gap, not much is known on how this strategy is being practiced.In view of the under-utilization of this strategy and limited studies on the strategy in non-native context, this paper investigates the use of this strategy in 150 research articles introductions in Computer Science disciplines written by academicians in Malaysian Universities.The finding of this study confirmed that indicating research gap” is underutilized by the research articles written in the corpus.In addition, this paper also described four various ways on how this strategy is commonly used by the non-native writers.The confirmation and authentic examples may be useful in the teaching and learning of research article writing

    Separability versus prototypicality in handwritten word-image retrieval

    Get PDF
    Hit lists are at the core of retrieval systems. The top ranks are important, especially if user feedback is used to train the system. Analysis of hit lists revealed counter-intuitive instances in the top ranks for good classifiers. In this study, we propose that two functions need to be optimised: (a) in order to reduce a massive set of instances to a likely subset among ten thousand or more classes, separability is required. However, the results need to be intuitive after ranking, reflecting (b) the prototypicality of instances. By optimising these requirements sequentially, the number of distracting images is strongly reduced, followed by nearest-centroid based instance ranking that retains an intuitive (low-edit distance) ranking. We show that in handwritten word-image retrieval, precision improvements of up to 35 percentage points can be achieved, yielding up to 100% top hit precision and 99% top-7 precision in data sets with 84 000 instances, while maintaining high recall performances. The method is conveniently implemented in a massive scale, continuously trainable retrieval engine, Monk. (C) 2013 Elsevier Ltd. All rights reserved

    Separability versus Prototypicality in Handwritten Word Retrieval

    Get PDF
    User appreciation of a word-image retrieval system is based on the quality ofa hit list for a query. Using support vector machines for ranking in largescale, handwritten document collections, we observed that many hit listssuffered from bad instances in the top ranks. An analysis of this problemrevealed that two functions needed to be optimised concerning bothseparability and prototypicality. By ranking images in two stages, the numberof distracting images is reduced, making the method very convenient formassive scale, continuously trainable retrieval engines. Instead of cumbersomeSVM training, we present a nearest-centroid method and show that precisionimprovements of up to 35 percentage points can be achieved, yielding up to100% precision in data sets with a large amount of instances, whilemaintaining high recall performances.<br/
    • …
    corecore