3 research outputs found

    Standardizing, Segmenting and Tenderizing Letters and Improving the Quality of Envelope Images to Extract Postal Addresses

    Get PDF
    In most mechanized postal systems, envelopes are scanned based on the postal standard using mechanical instruments. In the standard format, the image of envelopes lacks tilts, lines are along the horizontal axis and words are placed in a correct and non-oblique manner. In this article a new algorithm for rotating, segmentation and Tenderizing Letters for standardizing and increasing the quality of an envelope has been presented, which can be used in all text identification systems as three successful pre-processing algorithms. In the algorithm proposed, letters with any forms and tilts during scanning were rotated and standardized by applying a simple two-step algorithm based on what was written on the envelope without requiring the calculation of tilt angle. After standardization, the main regions of the image were specified using the histogram information. Then, in a simple algorithm, the candidate points from the pixels related to the text on the envelope were selected and quality improvement and tenderization were done on the main regions of the image. The advantaged of the proposed algorithm included No need for additional mechanical equipment, less calculation, simplicity and consideration of the structure of words on the envelope in all preprocessing phases.DOI:http://dx.doi.org/10.11591/ijece.v2i3.34

    Recognition of mathematical handwriting on whiteboards

    Get PDF
    Automatic recognition of handwritten mathematics has enjoyed significant improvements in the past decades. In particular, online recognition of mathematical formulae has seen a number of important advancements. However, in reality most mathematics is still taught and developed on regular whiteboards and offline recognition remains an open and challenging task in this area. In this thesis we develop methods to recognise mathematics from static images of handwritten expressions on whiteboards, while leveraging the strength of online recognition systems by transforming offline data into online information. Our approach is based on trajectory recovery techniques, that allow us to reconstruct the actual stroke information necessary for online recognition. To this end we develop a novel recognition process especially designed to deal with whiteboards by prudently extracting information from colour images. To evaluate our methods we use an online recogniser for the recognition task, which is specifically trained for recognition of maths symbols. We present our experiments with varying quality and sources of images. In particular, we have used our approach successfully in a set of experiments using Google Glass for capturing images from whiteboards, in which we achieve highest accuracies of 88.03% and 84.54% for segmentation and recognition of mathematical symbols respectively

    Vers un système omni-langage de recherche de mots dans des bases de documents écrits homogènes

    Get PDF
    The objective of our thesis is to build an omni-language word retrieval system for scanned documents. We place ourselves in the context where the content of documents is homogenous and the prior knowledge about the document (the language, the writer, the writing style, etc.) is not known. Due to this system, user can freely and intuitively compose his/her query. With the query created by the user, he/she can retrieve words in homogenous documents of any language, without finding an occurrence of the word to search. The key of our proposed system is the invariants, which are writing pieces that frequently appeared in the collection of documents. The invariants can be used in query making process in which the user selects and composes appropriate invariants to make the query. They can be also used as structural descriptor to characterize word images in the retrieval process. We introduce in this thesis our method for automatically extracting invariants from document collection, our evaluation method for evaluating the quality of invariants and invariant’s applications in the query making process as well as in the retrieval process.Notre thèse a pour objectif la construction d’un système omni-langage de recherche de mots dans les documents numérisés. Nous nous plaçons dans le contexte où le contenu du document est homogène (ce qui est le cas pour les documents anciens où l’écriture est souvent bien soignée et mono-scripteur) et la connaissance préalable du document (le langage, le scripteur, le type d’écriture, le tampon, etc.) n’est pas connue. Grâce à ce système, l'utilisateur peut composer librement et intuitivement sa requête et il peut rechercher des mots dans des documents homogènes de n’importe quel langage, sans détecter préalablement une occurrence du mot à rechercher. Le point clé du système que nous proposons est les invariants, qui sont les formes les plus fréquentes dans la collection de documents. Pour le requêtage, l’utilisateur pourra créer le mot à rechercher en utilisant les invariants (la composition des requêtes), grâce à une interface visuelle. Pour la recherche des mots, les invariants peuvent servir à construire des signatures structurelles pour représenter les images de mots. Nous présentons dans cette thèse la méthode pour extraire automatiquement les invariants à partir de la collection de documents, la méthode pour évaluer la qualité des invariants ainsi que les applications des invariants à la recherche de mots et à la composition des requêtes
    corecore