207 research outputs found

    Applying Domain Knowledge to the Recognition of Handwritten Zip Codes

    Get PDF

    RAPID ANALYTICAL VERIFICATION OF HANDWRITTEN ALPHANUMERIC ADDRESS FIELDS

    Get PDF
    Microsoft, Motorola, Siemens, Hitachi, IAPR, NICI, IUF This paper presents a combination of fuzzy system and dynamic analytical model to deal with imprecise data derived from feature extraction in handwritten address images which are compared against postulated addresses for address verification. A dynamic building­number locator is able to locate and recognise the building­number, without knowing exactly where the building­number starts in the candidate address line. The overall system achieved a correct sorting rate of 72.9%, 27.1% rejection rate and 0.0% error rate on a blind test set of 450 cursive handwritten addresses.

    Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

    Get PDF
    International audienceThis paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields

    Applying Genetic Algorithm in Multi Language\u27s Characters Recognition

    Get PDF

    A discrete hidden Markov model for the recognition of handwritten Farsi words

    Get PDF
    Handwriting recognition systems (HRS) have been researched for more than 50 years. Designing a system to recognize specific words in a handwritten clean document is still a difficult task and the challenge is to achieve a high recognition rate. Previously, most of the research in the handwriting recognition domain was conducted on Chinese and Latin languages, while recently more people have shown an interest in the Indo-Iranian script recognition systems. In this thesis, we present an automatic handwriting recognition system for Farsi words. The system was trained, validated and tested on the CENPARMI Farsi Dataset, which was gathered during this research. CENPARMI's Farsi Dataset is unique in terms of its huge number of images (432,357 combined grayscale and binary), inclusion of all possible handwriting types (Dates, Words, Isolated Characters, Isolated Digits, Numeral Strings, Special Symbols, Documents), the variety of cursive styles, the number of writers (400) and the exclusive participation of Native Farsi speakers in the gathering of data. The words were first preprocessed. Concavity and Distribution features were extracted and the codebook was calculated by the vector quantization method. A Discrete Hidden Markov Model was chosen as the classifier because of the cursive nature of the Farsi script. Finally, encouraging recognition rates of98.76% and 96.02% have been obtained for the Training and Testing sets, respectivel

    Document preprocessing and fuzzy unsupervised character classification

    Get PDF
    This dissertation presents document preprocessing and fuzzy unsupervised character classification for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation algorithm is performed based on a simple two-step run-length smoothing to decompose a document into single-mode blocks. Next, the block classification is performed based on the clustering rules to classify each block into one of the types such as text, horizontal or vertical lines, graphics, and pictures. The mean white-to-black transition is shown as an invariance for textual blocks, and is useful for block discrimination. A fuzzy model for unsupervised character classification is designed to improve the robustness, correctness, and speed of the character recognition system. The classification procedures are divided into two stages. The first stage separates the characters into seven typographical categories based on word structures of a text line. The second stage uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. A fuzzy model of unsupervised character classification, which is more natural in the representation of prototypes for character matching, is defined and the weighted fuzzy similarity measure is explored. The characteristics of the fuzzy model are discussed and used in speeding up the classification process. After classification, the character recognition procedure is simply applied on the limited versions of the fuzzy prototypes. To avoid information loss and extra distortion, an topography-based approach is proposed to apply directly on the fuzzy prototypes to extract the skeletons. First, a convolution by a bell-shaped function is performed to obtain a smooth surface. Second, the ridge points are extracted by rule-based topographic analysis of the structure. Third, a membership function is assigned to ridge points with values indicating the degrees of membership with respect to the skeleton of an object. Finally, the significant ridge points are linked to form strokes of skeleton, and the clues of eigenvalue variation are used to deal with degradation and preserve connectivity. Experimental results show that our algorithm can reduce the deformation of junction points and correctly extract the whole skeleton although a character is broken into pieces. For some characters merged together, the breaking candidates can be easily located by searching for the saddle points. A pruning algorithm is then applied on each breaking position. At last, a multiple context confirmation can be applied to increase the reliability of breaking hypotheses

    A novel approach to handwritten character recognition

    Get PDF
    A number of new techniques and approaches for off-line handwritten character recognition are presented which individually make significant advancements in the field. First. an outline-based vectorization algorithm is described which gives improved accuracy in producing vector representations of the pen strokes used to draw characters. Later. Vectorization and other types of preprocessing are criticized and an approach to recognition is suggested which avoids separate preprocessing stages by incorporating them into later stages. Apart from the increased speed of this approach. it allows more effective alteration of the character images since more is known about them at the later stages. It also allows the possibility of alterations being corrected if they are initially detrimental to recognition. A new feature measurement. the Radial Distance/Sector Area feature. is presented which is highly robust. tolerant to noise. distortion and style variation. and gives high accuracy results when used for training and testing in a statistical or neural classifier. A very powerful classifier is therefore obtained for recognizing correctly segmented characters. The segmentation task is explored in a simple system of integrated over-segmentation. Character classification and approximate dictionary checking. This can be extended to a full system for handprinted word recognition. In addition to the advancements made by these methods. a powerful new approach to handwritten character recognition is proposed as a direction for future research. This proposal combines the ideas and techniques developed in this thesis in a hierarchical network of classifier modules to achieve context-sensitive. off-line recognition of handwritten text. A new type of "intelligent" feedback is used to direct the search to contextually sensible classifications. A powerful adaptive segmentation system is proposed which. when used as the bottom layer in the hierarchical network. allows initially incorrect segmentations to be adjusted according to the hypotheses of the higher level context modules

    Document analysis at DFKI. - Part 1: Image analysis and text recognition

    Get PDF
    Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at the DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of documents in terms of iconic, structural, textual, and semantic information. These symbolic document descriptions enable an "intelligent\u27; access to a document database. Currently there are three ongoing document analysis projects at DFKI: INCA, OMEGA, and PASCAL2000/PASCAL+. Though the projects pursue different goals in different application domains, they all share the same problems which have to be resolved with similar techniques. For that reason the activities in these projects are bundled to avoid redundant work. At DFKI we have divided the problem of document analysis into two main tasks, text recognition and text analysis, which themselves are divided into a set of subtasks. In a series of three research reports the work of the document analysis and office automation department at DFKI is presented. The first report discusses the problem of text recognition, the second that of text analysis. In a third report we describe our concept for a specialized document analysis knowledge representation language. The report in hand describes the activities dealing with the text recognition task. Text recognition covers the phase starting with capturing a document image up to identifying the written words. This comprises the following subtasks: preprocessing the pictorial information, segmenting into blocks, lines, words, and characters, classifying characters, and identifying the input words. For each subtask several competing solution algorithms, called specialists or knowledge sources, may exist. To efficiently control and organize these specialists an intelligent situation-based planning component is necessary, which is also described in this report. It should be mentioned that the planning component is also responsible to control the overall document analysis system instead of the text recognition phase onl

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average
    • …
    corecore