49 research outputs found

    최적화 방법을 이용한 문서영상의 텍스트 라인 및 단어 검출법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 8. 조남익.Locating text-lines and segmenting words in a document image are important processes for various document image processing applications such as optical character recognition, document rectification, layout analysis and document image compression. Thus, there have been a lot of researches in this area, and the segmentation of machine-printed documents scanned by flatbed scanners have been matured to some extent. However, in the case of handwritten documents, it is considered a challenging problem since the features of handwritten document are irregular and diverse depending on a person and his/her language. To address this problem, this dissertation presents new segmentation algorithms which extract text-lines and words from a document image based on a new super-pixel representation method and a new energy minimization framework from its characteristics. The overview of the proposed algorithms is as follows. First, this dissertation presents a text-line extraction algorithm for handwritten documents based on an energy minimization framework with a new super-pixel representation scheme. In order to deal with the documents in various languages, a language-independent text-line extraction algorithm is developed based on the super-pixel representation with normalized connected components(CCs). Due to this normalization, the proposed method is able to estimate the states of super-pixels for a range of different languages and writing styles. From the estimated states, an energy function is formulated whose minimization yields text-lines. Experimental results show that the proposed method yields the state-of-the-art performance on various handwritten databases. Second, a preprocessing method of historical documents for text-line detection is presented. Unlike modern handwritten documents, historical documents suffer from various types of degradations. To alleviate these roblems, the preprocessing algorithm including robust binarization and noise removal is introduced in this dissertation. For the robust binarization of historical documents, global and local thresholding binarization methods are combined to deal with various degradations such as stains and fainted characters. Also, the energy minimization framework is modified to fit the characteristics of historical documents. Experimental results on two historical databases show that the proposed preprocessing method with text-line detection algorithm achieves the best detection performance on severely degraded historical documents. Third, this dissertation presents word segmentation algorithm based on structured learning framework. In this dissertation, the word segmentation problem is formulated as a labeling problem that assigns a label (intra- word/inter-word gap) to each gap between the characters in a given text-line. In order to address the feature irregularities especially on handwritten documents, the word segmentation problem is formulated as a binary quadratic assignment problem that considers pairwise correlations between the gaps as well as the likelihoods of individual gaps based on the proposed text-line extraction results. Even though many parameters are involved in the formulation, all parameters are estimated based on the structured SVM framework so that the proposed method works well regardless of writing styles and written languages without user-defined parameters. Experimental results on ICDAR 2009/2013 handwriting segmentation databases show that proposed method achieves the state-of-the-art performance on Latin-based and Indian languages.Abstract i Contents iii List of Figures vii List of Tables xiii 1 Introduction 1 1.1 Text-line Detection of Document Images 2 1.2 Word Segmentation of Document Images 5 1.3 Summary of Contribution 8 2 Related Work 11 2.1 Text-line Detection 11 2.2 Word Segmentation 13 3 Text-line Detection of Handwritten Document Images based on Energy Minimization 15 3.1 Proposed Approach for Text-line Detection 15 3.1.1 State Estimation of a Document Image 16 3.1.2 Problems with Under-segmented Super-pixels for Estimating States 18 3.1.3 A New Super-pixel Representation Method based on CC Partitioning 20 3.1.4 Cost Function for Text-line Segmentation 24 3.1.5 Minimization of Cost Function 27 3.2 Experimental Results of Various Handwritten Databases 30 3.2.1 Evaluation Measure 31 3.2.2 Parameter Selection 31 3.2.3 Experiment on HIT-MW Database 32 3.2.4 Experiment on ICDAR 2009/2013 Handwriting Segmentation Databases 35 3.2.5 Experiment on IAM Handwriting Database 38 3.2.6 Experiment on UMD Handwritten Arabic Database 46 3.2.7 Limitations 48 4 Preprocessing Method of Historical Document for Text-line Detection 53 4.1 Characteristics of Historical Documents 54 4.2 A Combined Approach for the Binarization of Historical Documents 56 4.3 Experimental Results of Text-line Detection for Historical Documents 61 4.3.1 Evaluation Measure and Configurations 61 4.3.2 George Washington Database 63 4.3.3 ICDAR 2015 ANDAR Datasets 65 5 Word Segmentation Method for Handwritten Documents based on Structured Learning 69 5.1 Proposed Approach for Word Segmentation 69 5.1.1 Text-line Segmentation and Super-pixel Representation 70 5.1.2 Proposed Energy Function for Word Segmentation 71 5.2 Structured Learning Framework 72 5.2.1 Feature Vector 72 5.2.2 Parameter Estimation by Structured SVM 75 5.3 Experimental Results 77 6 Conclusions 83 Bibliography 85 Abstract (Korean) 96Docto

    An Optical Character Recognition Engine for Graphical Processing Units

    Get PDF
    This dissertation investigates how to build an optical character recognition engine (OCR) for a graphical processing unit (GPU). I introduce basic concepts for both building an OCR engine and for programming on the GPU. I then describe the SegRec algorithm in detail and discuss my findings

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Adaptive Methods for Robust Document Image Understanding

    Get PDF
    A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy

    Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

    Get PDF
    Sketch recognition is the computer understanding of hand drawn diagrams. Recognizing sketches instantaneously is necessary to build beautiful interfaces with real time feedback. There are various techniques to quickly recognize sketches into ten or twenty classes. However for much larger datasets of sketches from a large number of classes, these existing techniques can take an extended period of time to accurately classify an incoming sketch and require significant computational overhead. Thus, to make classification of large datasets feasible, we propose using multiple stages of recognition. In the initial stage, gesture-based feature values are calculated and the trained model is used to classify the incoming sketch. Sketches with an accuracy less than a threshold value, go through a second stage of geometric recognition techniques. In the second geometric stage, the sketch is segmented, and sent to shape-specific recognizers. The sketches are matched against predefined shape descriptions, and confidence values are calculated. The system outputs a list of classes that the sketch could be classified as, along with the accuracy, and precision for each sketch. This process both significantly reduces the time taken to classify such huge datasets of sketches, and increases both the accuracy and precision of the recognition

    Integrating Multiple Sketch Recognition Methods to Improve Accuracy and Speed

    Get PDF
    Sketch recognition is the computer understanding of hand drawn diagrams. Recognizing sketches instantaneously is necessary to build beautiful interfaces with real time feedback. There are various techniques to quickly recognize sketches into ten or twenty classes. However for much larger datasets of sketches from a large number of classes, these existing techniques can take an extended period of time to accurately classify an incoming sketch and require significant computational overhead. Thus, to make classification of large datasets feasible, we propose using multiple stages of recognition. In the initial stage, gesture-based feature values are calculated and the trained model is used to classify the incoming sketch. Sketches with an accuracy less than a threshold value, go through a second stage of geometric recognition techniques. In the second geometric stage, the sketch is segmented, and sent to shape-specific recognizers. The sketches are matched against predefined shape descriptions, and confidence values are calculated. The system outputs a list of classes that the sketch could be classified as, along with the accuracy, and precision for each sketch. This process both significantly reduces the time taken to classify such huge datasets of sketches, and increases both the accuracy and precision of the recognition

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Recognition of mathematical handwriting on whiteboards

    Get PDF
    Automatic recognition of handwritten mathematics has enjoyed significant improvements in the past decades. In particular, online recognition of mathematical formulae has seen a number of important advancements. However, in reality most mathematics is still taught and developed on regular whiteboards and offline recognition remains an open and challenging task in this area. In this thesis we develop methods to recognise mathematics from static images of handwritten expressions on whiteboards, while leveraging the strength of online recognition systems by transforming offline data into online information. Our approach is based on trajectory recovery techniques, that allow us to reconstruct the actual stroke information necessary for online recognition. To this end we develop a novel recognition process especially designed to deal with whiteboards by prudently extracting information from colour images. To evaluate our methods we use an online recogniser for the recognition task, which is specifically trained for recognition of maths symbols. We present our experiments with varying quality and sources of images. In particular, we have used our approach successfully in a set of experiments using Google Glass for capturing images from whiteboards, in which we achieve highest accuracies of 88.03% and 84.54% for segmentation and recognition of mathematical symbols respectively

    Analyzing Handwritten and Transcribed Symbols in Disparate Corpora

    Get PDF
    Cuneiform tablets appertain to the oldest textual artifacts used for more than three millennia and are comparable in amount and relevance to texts written in Latin or ancient Greek. These tablets are typically found in the Middle East and were written by imprinting wedge-shaped impressions into wet clay. Motivated by the increased demand for computerized analysis of documents within the Digital Humanities, we develop the foundation for quantitative processing of cuneiform script. Using a 3D-Scanner to acquire a cuneiform tablet or manually creating line tracings are two completely different representations of the same type of text source. Each representation is typically processed with its own tool-set and the textual analysis is therefore limited to a certain type of digital representation. To homogenize these data source a unifying minimal wedge feature description is introduced. It is extracted by pattern matching and subsequent conflict resolution as cuneiform is written densely with highly overlapping wedges. Similarity metrics for cuneiform signs based on distinct assumptions are presented. (i) An implicit model represents cuneiform signs using undirected mathematical graphs and measures the similarity of signs with graph kernels. (ii) An explicit model approaches the problem of recognition by an optimal assignment between the wedge configurations of two signs. Further, methods for spotting cuneiform script are developed, combining the feature descriptors for cuneiform wedges with prior work on segmentation-free word spotting using part-structured models. The ink-ball model is adapted by treating wedge feature descriptors as individual parts. The similarity metrics and the adapted spotting model are both evaluated on a real-world dataset outperforming the state-of-the-art in cuneiform sign similarity and spotting. To prove the applicability of these methods for computational cuneiform analysis, a novel approach is presented for mining frequent constellations of wedges resulting in spatial n-grams. Furthermore, a method for automatized transliteration of tablets is evaluated by employing structured and sequential learning on a dataset of parallel sentences. Finally, the conclusion outlines how the presented methods enable the development of new tools and computational analyses, which are objective and reproducible, for quantitative processing of cuneiform script

    Text detection and recognition in natural images using computer vision techniques

    Get PDF
    El reconocimiento de texto en imágenes reales ha centrado la atención de muchos investigadores en todo el mundo en los últimos años. El motivo es el incremento de productos de bajo coste como teléfonos móviles o Tablet PCs que incorporan dispositivos de captura de imágenes y altas capacidades de procesamiento. Con estos antecedentes, esta tesis presenta un método robusto para detectar, localizar y reconocer texto horizontal en imágenes diurnas tomadas en escenarios reales. El reto es complejo dada la enorme variabilidad de los textos existentes y de las condiciones de captura en entornos reales. Inicialmente se presenta una revisión de los principales trabajos de los últimos años en el campo del reconocimiento de texto en imágenes naturales. Seguidamente, se lleva a cabo un estudio de las características más adecuadas para describir texto respecto de objetos no correspondientes con texto. Típicamente, un sistema de reconocimiento de texto en imágenes está formado por dos grandes etapas. La primera consiste en detectar si existe texto en la imagen y de localizarlo con la mayor precisión posible, minimizando la cantidad de texto no detectado así como el número de falsos positivos. La segunda etapa consiste en reconocer el texto extraído. El método de detección aquí propuesto está basado en análisis de componentes conexos tras aplicar una segmentación que combina un método global como MSER con un método local, de forma que se mejoran las propuestas del estado del arte al segmentar texto incluso en situaciones complejas como imágenes borrosas o de muy baja resolución. El proceso de análisis de los componentes conexos extraídos se optimiza mediante algoritmos genéticos. Al contrario que otros sistemas, nosotros proponemos un método recursivo que permite restaurar aquellos objetos correspondientes con texto y que inicialmente son erróneamente descartados. De esta forma, se consigue mejorar en gran medida la fiabilidad de la detección. Aunque el método propuesto está basado en análisis de componentes conexos, en esta tesis se utiliza también la idea de los métodos basados en texturas para validar las áreas de texto detectadas. Por otro lado, nuestro método para reconocer texto se basa en identificar cada caracter y aplicar posteriormente un modelo de lenguaje para corregir las palabras mal reconocidas, al restringir la solución a un diccionario que contiene el conjunto de posibles términos. Se propone una nueva característica para reconocer los caracteres, a la que hemos dado el nombre de Direction Histogram (DH). Se basa en calcular el histograma de las direcciones del gradiente en los pixeles de borde. Esta característica se compara con otras del estado del arte y los resultados experimentales obtenidos sobre una base de datos compleja muestran que nuestra propuesta es adecuada ya que supera otros trabajos del estado del arte. Presentamos también un método de clasificación borrosa de letras basado en KNN, el cual permite separar caracteres erróneamente conectados durante la etapa de segmentación. El método de reconocimiento de texto propuesto no es solo capaz de reconocer palabras, sino también números y signos de puntuación. El reconocimiento de palabras se lleva a cabo mediante un modelo de lenguaje basado en inferencia probabilística y el British National Corpus, un completo diccionario del inglés británico moderno, si bien el algoritmo puede ser fácilmente adaptado para ser usado con cualquier otro diccionario. El modelo de lenguaje utiliza una modificación del algoritmo forward usando en Modelos Ocultos de Markov. Para comprobar el rendimiento del sistema propuesto, se han obtenido resultados experimentales con distintas bases de datos, las cuales incluyen imágenes en diferentes escenarios y situaciones. Estas bases de datos han sido usadas como banco de pruebas en la última década por la mayoría de investigadores en el área de reconocimiento de texto en imágenes naturales. Los resultados muestran que el sistema propuesto logra un rendimiento similar al del estado del arte en términos de localización, mientras que lo supera en términos de reconocimiento. Con objeto de mostrar la aplicabilidad del método propuesto en esta tesis, se presenta también un sistema de detección y reconocimiento de la información contenida en paneles de tráfico basado en el algoritmo desarrollado. El objetivo de esta aplicación es la creación automática de inventarios de paneles de tráfico de países o regiones que faciliten el mantenimiento de la señalización vertical de las carreteras, usando imágenes disponibles en el servicio Street View de Google. Se ha creado una base de datos para esta aplicación. Proponemos modelar los paneles de tráfico usando apariencia visual en lugar de las clásicas soluciones que utilizan bordes o características geométricas, con objeto de detectar aquellas imágenes en las que existen paneles de tráfico. Los resultados experimentales muestran la viabilidad del sistema propuesto
    corecore