92 research outputs found

    A Bottom Up Procedure for Text Line Segmentation of Latin Script

    Full text link
    In this paper we present a bottom up procedure for segmentation of text lines written or printed in the Latin script. The proposed method uses a combination of image morphology, feature extraction and Gaussian mixture model to perform this task. The experimental results show the validity of the procedure.Comment: Accepted and presented at the IEEE conference "International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2017

    Document preprocessing and fuzzy unsupervised character classification

    Get PDF
    This dissertation presents document preprocessing and fuzzy unsupervised character classification for automatically reading daily-received office documents that have complex layout structures, such as multiple columns and mixed-mode contents of texts, graphics and half-tone pictures. First, the block segmentation algorithm is performed based on a simple two-step run-length smoothing to decompose a document into single-mode blocks. Next, the block classification is performed based on the clustering rules to classify each block into one of the types such as text, horizontal or vertical lines, graphics, and pictures. The mean white-to-black transition is shown as an invariance for textual blocks, and is useful for block discrimination. A fuzzy model for unsupervised character classification is designed to improve the robustness, correctness, and speed of the character recognition system. The classification procedures are divided into two stages. The first stage separates the characters into seven typographical categories based on word structures of a text line. The second stage uses pattern matching to classify the characters in each category into a set of fuzzy prototypes based on a nonlinear weighted similarity function. A fuzzy model of unsupervised character classification, which is more natural in the representation of prototypes for character matching, is defined and the weighted fuzzy similarity measure is explored. The characteristics of the fuzzy model are discussed and used in speeding up the classification process. After classification, the character recognition procedure is simply applied on the limited versions of the fuzzy prototypes. To avoid information loss and extra distortion, an topography-based approach is proposed to apply directly on the fuzzy prototypes to extract the skeletons. First, a convolution by a bell-shaped function is performed to obtain a smooth surface. Second, the ridge points are extracted by rule-based topographic analysis of the structure. Third, a membership function is assigned to ridge points with values indicating the degrees of membership with respect to the skeleton of an object. Finally, the significant ridge points are linked to form strokes of skeleton, and the clues of eigenvalue variation are used to deal with degradation and preserve connectivity. Experimental results show that our algorithm can reduce the deformation of junction points and correctly extract the whole skeleton although a character is broken into pieces. For some characters merged together, the breaking candidates can be easily located by searching for the saddle points. A pruning algorithm is then applied on each breaking position. At last, a multiple context confirmation can be applied to increase the reliability of breaking hypotheses

    Hidden Markov model and its application in document image analysis

    Get PDF

    Junction Point Detection And Identification Of Broken Character In Touching Arabic Handwritten Text Using Overlapping Set Theory

    Get PDF
    Touching characters are formed when two or more characters share the same space with each other. Therefore, segmentation of these touching character is very challenging research topic especially for handwritten Arabic degraded documents. This is one of the key issue in recognition of the handwritten Arabic text. In order to make the recognition system more effective segmentation of these touching handwritten Arabic characters is considered to be very important research area. In this research, a new method is proposed, which is used to identify the junction or common point of Arabic touching word image by applying overlapping or intersection set theory operation, which will help to trace the correct boundary of the touching characters, identify the broken characters and also segmented these touching handwritten text in an efficient way. The proposed method has been evaluated on Arabic touching handwritten characters taken from handwritten datasets. The results show the efficiency of the proposed method. The proposed method is applicable to both degraded handwritten documents and printed documents

    TORT3D: A MATLAB code to compute geometric tortuosity from 3D images of unconsolidated porous media

    Get PDF
    Tortuosity is a parameter that plays a significant role in the characterization of complex porous media systems and it has a significant impact on many engineering and environmental processes and applications. Flow in porous media, diffusion of gases in complex pore structures and membrane flux in water desalination are examples of the application of this important micro-scale parameter. In this paper, an algorithm was developed and implemented as a MATLAB code to compute tortuosity from three-dimensional images. The code reads a segmented image and finds all possible tortuous paths required to compute tortuosity. The code is user-friendly, easy to use and computationally efficient, as it requires a relatively short time to identify all possible connected paths between two boundaries of large images. The main idea of the developed algorithm is that it conducts a guided search for connected paths in the void space of the image utilizing the medial surface of the void space. Once all connected paths are identified in a specific direction, the average of all connected paths in that direction is used to compute tortuosity. Three-dimensional images of sand systems acquired using X-ray computed tomography were used to validate the algorithm. Tortuosity values were computed from three-dimensional images of nine different natural sand systems using the developed algorithm and compared with predicted values by models available in the literature. Findings indicate that the code can successfully compute tortuosity for any unconsolidated porous system irrespective of the shape (i.e., geometry) of particles. 1 2017 Elsevier B.V.Scopu
    corecore