3,808 research outputs found

    Kannada Character Recognition System A Review

    Full text link
    Intensive research has been done on optical character recognition ocr and a large number of articles have been published on this topic during the last few decades. Many commercial OCR systems are now available in the market, but most of these systems work for Roman, Chinese, Japanese and Arabic characters. There are no sufficient number of works on Indian language character recognition especially Kannada script among 12 major scripts in India. This paper presents a review of existing work on printed Kannada script and their results. The characteristics of Kannada script and Kannada Character Recognition System kcr are discussed in detail. Finally fusion at the classifier level is proposed to increase the recognition accuracy.Comment: 12 pages, 8 figure

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Gene transfer by interspecific hybridization in bryophytes

    Get PDF
    The role of hybridization in evolution has been debated for more than a century regarding bryophytes (mosses, liverworts, and hornworts) as well as most other organisms. Bryophytes have haplodiplontic life cycles with a dominant haploid generation. Hybridization in bryophytes involves fusion of gametes produced by haploid parental gametophytes of different species. The hybrid is thus the short-lived diploid sporophytes, which soon undergoes meiosis prior to forming a large amount of haploid recombinant spores. In this study, two moss species (Homalothecium lutescens and H. sericeum) and three subspecies of liverwort Marchantia polymorpha were investigated for evidence of gene transfer by hybridization.Firstly, we compared the morphology of gametophytes and sporophytes from allopatric and sympatric populations of H. lutescens and H. sericeum. Secondly, we used species-specific SNP markers to estimate the degree of genetic mixing in three generations (i.e., haploid maternal gametophytes, diploid sporophytes, and haploid sporelings) in samples from sympatric populations of H. lutescens and H. sericeum. Thirdly, we assessed fitness traits in relation to the degree of genetic admixture in sporophytes of H. lutescens and H. sericeum, including non-admixed, mildly and strongly admixed genotypes. Finally, we investigated the genome-wide scale phylogenetic relationship between the three subspecies of M. polymorpha to test the hypothesis that subsp. ruderalis has originated as a homoploid hybrid species between subsp. polymorpha and subsp. montivagans. Our study of Homalothecium shows that gametophytes from sympatric populations display intermediate morphology in a number of leaf characters, with the exception for leaf dimensions, which are strikingly smaller than those in allopatric populations. Most sporophytes with intermediate capsule inclination, initially classified as putative hybrids, did not display admixture of SNP markers. Many sporophytes appeared to be secondary hybrids by displaying asymmetrical admixture of SNP markers except five sporophytes, which were found to be primary hybrids. Admixture analyses using SNP markers identified 76 samples (17%) as mildly admixed and 17 samples (3.8%) as strongly admixed. Admixed samples represented all three generations and were found in all sympatric populations. Hybridization and introgression were bidirectional. Admixed sporophytes gave rise to viable recombinant spores and sporelings. Sporophytes with mildly admixed H. lutescens tended to show lower fitness, whereas sporophytes with mildly admixed H. sericeum showed signs of heterosis. Some strongly admixed sporophytes showed high spore counts, intermediate spore diameters and high spore germination rates. Genomic analysis showed three distinct taxa within the M. polymorpha complex, coinciding with the three generally accepted subspecies. All three possible topologies were frequent across the genome but species tree analyses using M. paleacea as outgroup recovered an overall branching order where subsp. montivagans diverged first and subsp. ruderalis and subsp. polymorpha were placed as sister species. The high degree of inconsistent gene trees suggests frequent incomplete lineage sorting (ILS) and/or recent or intermittent introgression. Evidence for recent introgression was found in two samples of M. polymorpha. Remarkably, pseudo-chromosome 2 in subsp. montivagans differed by being more diverged than other parts of the genomes. This could either be explained by specific capture of chromosome 2 from an unknown related species through hybridization or by conservation of chromosome 2 despite intermittent or ongoing introgression affecting more permeable parts of the genomes. A higher degree of chromosomal rearrangement in pseudochromosome 2 of subsp. montivagans provide some evidence for the latter explanation.Our results show that gene transfer between lineages occurs in sympatric populations of both the Marchantia polymorpha complex and among the Homalothecium species. This supports the hypothesis that homoploid hybridization is more widespread among bryophytes than earlier assumed. Moreover, the population-level studies of sympatric populations of H. lutescens and H. sericeum demonstrate that they behave as true hybrid zones, where genetic material is transferred across species boundaries and secondarily backcrossed. Presence of hybrid zones has strong evolutionary implications because genetic material transferred across species boundaries can be directly subject of natural selection in the dominant haploid generation of the bryophyte life cycle, and contribute to local adaptation, survival and speciation

    Character recognition and information retrieval

    Full text link
    Presented are two technologies, character recognition and information retrieval, that are used for text processing. Character recognition translates text image data to a computer-coded format; information retrieval stores these data and provides efficient access to the text. The necessity of their eventual coupling is obvious. Their sequential application though (with no manual intervention) has been considered impractical at best. Our experimentation exploits these two technologies in just this way. We identify problems with their combined use, as well as show that the technologies have come to a point where they can be applied in succession

    Handwritten Devanagari numeral recognition

    Get PDF
    Optical character recognition (OCR) plays a very vital role in today’s modern world. OCR can be useful for solving many complex problems and thus making human’s job easier. In OCR we give a scanned digital image or handwritten text as the input to the system. OCR can be used in postal department for sorting of the mails and in other offices. Much work has been done for English alphabets but now a day’s Indian script is an active area of interest for the researchers. Devanagari is on such Indian script. Research is going on for the recognition of alphabets but much less concentration is given on numerals. Here an attempt was made for the recognition of Devanagari numerals. The main part of any OCR system is the feature extraction part because more the features extracted more is the accuracy. Here two methods were used for the process of feature extraction. One of the method was moment based method. There are many moment based methods but we have preferred the Tchebichef moment. Tchebichef moment was preferred because of its better image representation capability. The second method was based on the contour curvature. Contour is a very important boundary feature used for finding similarity between shapes. After the process of feature extraction, the extracted feature has to be classified and for the same Artificial Neural Network (ANN) was used. There are many classifier but we preferred ANN because it is easy to handle and less error prone and apart from that its accuracy is much higher compared to other classifier. The classification was done individually with the two extracted features and finally the features were cascaded to increase the accuracy

    Adaptive Algorithms for Automated Processing of Document Images

    Get PDF
    Large scale document digitization projects continue to motivate interesting document understanding technologies such as script and language identification, page classification, segmentation and enhancement. Typically, however, solutions are still limited to narrow domains or regular formats such as books, forms, articles or letters and operate best on clean documents scanned in a controlled environment. More general collections of heterogeneous documents challenge the basic assumptions of state-of-the-art technology regarding quality, script, content and layout. Our work explores the use of adaptive algorithms for the automated analysis of noisy and complex document collections. We first propose, implement and evaluate an adaptive clutter detection and removal technique for complex binary documents. Our distance transform based technique aims to remove irregular and independent unwanted foreground content while leaving text content untouched. The novelty of this approach is in its determination of best approximation to clutter-content boundary with text like structures. Second, we describe a page segmentation technique called Voronoi++ for complex layouts which builds upon the state-of-the-art method proposed by Kise [Kise1999]. Our approach does not assume structured text zones and is designed to handle multi-lingual text in both handwritten and printed form. Voronoi++ is a dynamically adaptive and contextually aware approach that considers components' separation features combined with Docstrum [O'Gorman1993] based angular and neighborhood features to form provisional zone hypotheses. These provisional zones are then verified based on the context built from local separation and high-level content features. Finally, our research proposes a generic model to segment and to recognize characters for any complex syllabic or non-syllabic script, using font-models. This concept is based on the fact that font files contain all the information necessary to render text and thus a model for how to decompose them. Instead of script-specific routines, this work is a step towards a generic character and recognition scheme for both Latin and non-Latin scripts

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include

    An End-to-End License Plate Localization and Recognition System

    Get PDF
    An end-to-end license plate recognition (LPR) system is proposed. It is composed of pre-processing, detection, segmentation and character recognition to find and recognize plates from camera based still images. The system utilizes connected component (CC) properties to quickly extract the license plate region. A novel two-stage CC filtering is utilized to address both shape and spatial relationship information to produce high precision and recall values for detection. Floating peak and valleys (FPV) of projection profiles are used to cut the license plates into individual characters. A turning function based method is proposed to recognize each character quickly and accurately. It is further accelerated using curvature histogram based support vector machine (SVM). The INFTY dataset is used to train the recognition system. And MediaLab license plate dataset is used for testing. The proposed system achieved 89.45% F-measure for detection and 87.33% accuracy for overall recognition rate which is comparable to current state-of-the-art systems
    • 

    corecore