762 research outputs found

    Off-line Thai handwriting recognition in legal amount

    Get PDF
    Thai handwriting in legal amounts is a challenging problem and a new field in the area of handwriting recognition research. The focus of this thesis is to implement Thai handwriting recognition system. A preliminary data set of Thai handwriting in legal amounts is designed. The samples in the data set are characters and words of the Thai legal amounts and a set of legal amounts phrases collected from a number of native Thai volunteers. At the preprocessing and recognition process, techniques are introduced to improve the characters recognition rates. The characters are divided into two smaller subgroups by their writing levels named body and high groups. The recognition rates of both groups are increased based on their distinguished features. The writing level separation algorithms are implemented using the size and position of characters. Empirical experiments are set to test the best combination of the feature to increase the recognition rates. Traditional recognition systems are modified to give the accumulative top-3 ranked answers to cover the possible character classes. At the postprocessing process level, the lexicon matching algorithms are implemented to match the ranked characters with the legal amount words. These matched words are joined together to form possible choices of amounts. These amounts will have their syntax checked in the last stage. Several syntax violations are caused by consequence faulty character segmentation and recognition resulting from connecting or broken characters. The anomaly in handwriting caused by these characters are mainly detected by their size and shape. During the recovery process, the possible word boundary patterns can be pre-defined and used to segment the hypothesis words. These words are identified by the word recognition and the results are joined with previously matched words to form the full amounts and checked by the syntax rules again. From 154 amounts written by 10 writers, the rejection rate is 14.9 percent with the recovery processes. The recognition rate for the accepted amount is 100 percent

    Papers in Southeast Asian Linguistics No. 9: Language policy, language planning and sociolinguistics in South-East Asia

    Get PDF

    Reading Early New Testament Manuscripts

    Get PDF

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF

    Multi-font Numerals Recognition for Urdu Script based Languages

    Get PDF
    International audienceHandwritten character recognition of Urdu script based languages is one of the most difficult task due to complexities of the script. Urdu script based languages has not received much attestation even this script is used more than 1/6th of the population. The complexities in the script makes more complicated the recognition process. The problem in handwritten numeral recognition is the shape similarity between handwritten numerals and dual style for Urdu. This paper presents a fuzzy rule base, HMM and Hybrid approaches for the recognition of numerals both Urdu and Arabic in unconstrained environment from both online and offline domain for online input. Basically offline domain is used for preprocessing i.e normalization, slant normalization. The proposed system is tested and provides accuracy of 97.1

    A novel image matching approach for word spotting

    Get PDF
    Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, image retrieval and information filtering. The important factors in word spotting techniques are pre-processing, selection and extraction of proper features and image matching algorithms. The Correlation Similarity Measure (CORR) algorithm is considered to be a faster matching algorithm, originally defined for finding similarities between binary patterns. In the word spotting literature the CORR algorithm has been used successfully to compare the GSC binary features extracted from binary word images, i.e., Gradient, Structural and Concavity (GSC) features. However, the problem with this approach is that binarization of images leads to a loss of very useful information. Furthermore, before extracting GSC binary features the word images must be skew corrected and slant normalized, which is not only difficult but in some cases impossible in Arabic and modified Arabic scripts. We present a new approach in which the Correlation Similarity Measure (CORR) algorithm has been used innovatively to compare Gray-scale word images. In this approach, binarization of images, skew correction and slant normalization of word images are not required at all. The various features, i.e., projection profiles, word profiles and transitional features are extracted from the Gray-scale word images and converted into their binary equivalents, which are compared via CORR algorithm with greater speed and higher accuracy. The experiments have been conducted on Gray-scale versions of newly created handwritten databases of Pashto and Dari languages, written in modified Arabic scripts. For each of these languages we have used 4599 words relating to 21 different word classes collected from 219 writers. The average precision rates achieved for Pashto and Dari languages were 93.18 % and 93.75 %, respectively. The time taken for matching a pair of images was 1.43 milli-seconds. In addition, we will present the handwritten databases for two well-known Indo- Iranian languages, i.e., Pashto and Dari languages. These are large databases which contain six types of data, i.e., Dates, Isolated Digits, Numeral Strings, Isolated Characters, Different Words and Special Symbols, written by native speakers of the corresponding languages

    Further Typological Studies in Southeast Asian Languages

    Get PDF

    Prototypes and Metaphorical Extensions: The Japanese Numeral Classifiers hiki and hatsu

    Get PDF
    This study concerns the meaning of Japanese numeral classifiers (NCs) and, particularly, the elements which guide us to understand the metaphorical meanings they can convey. In the typological literature, as well as in studies of Japanese, the focus is almost entirely on NCs that refer to entities. NCs are generally characterised as being matched with a noun primarily based on semantic criteria such as the animacy, the physical characteristics, or the function of the referent concerned. However, in some languages, including Japanese, nouns allow a number of alternative NCs, so that it is considered that NCs are not automatically matched with a noun but rather with the referent that the noun refers to in the particular context in which it occurs. This study examines data from the Balanced Corpus of Contemporary Written Japanese, and focuses on two NCs as case studies: hiki, an entity NC, typically used to classify small, animate beings, and hatsu, an NC that is used to classify both entities and events that are typically explosive in nature. The study employs the framework of Prototype Theory, along with the theory of conceptual metaphor, and the theory of metonymy. The analysis of the data identified a number of semantic components for each of the target NCs; by drawing on these components, the speaker can subjectively add those meanings to modify the meaning of the referring noun or verb. Furthermore, the study revealed that the choice of NCs can be influenced by two factors. First, the choice of NC sometimes relates to the linguistic context in which the referring noun or verb occurs. For example, if a noun is used metaphorically, the NC is chosen to reinforce that metaphor, rather than to match with the actual referent. Second, the meaning of an NC itself can be used as a vehicle of metaphor to contribute meaning to that of the referring noun or verb concerned. Through the analysis, is has been identified that the range of referents of a single NC beyond cases in which objectively observable characteristics are evident occurs in two dimensions: (1) in terms of the typicality of referents and (2) across categories of referents (entities and events). Based on the findings, the study claims that, in both cases, non-literal factors account for extension in the range of referents of an NC in Japanese. Specifically, the non-literal devices of metaphor and metonymy appear to play a role in connecting an NC and its referent in the context in which extension of the use of that NC occurs
    corecore