Search CORE

2,205 research outputs found

Assessment of OCR Quality and Font Identification in Historical Documents

Author: Gupta Anshul
Publication venue
Publication date: 18/01/2019
Field of study

Mass digitization of historical documents is a challenging problem for optical character recognition (OCR) tools. Issues include noisy backgrounds and faded text due to aging, border/marginal noise, bleed-through, skewing, warping, as well as irregular fonts and page layouts. As a result, OCR tools often produce a large number of spurious bounding boxes (BBs) in addition to those that correspond to words in the document. To improve the OCR output, in this thesis we develop machine-learning methods to assess the quality of historical documents and label/tag documents (with the page problems) in the EEBO/ECCO collections—45 million pages available through the Early Modern OCR Project at Texas A&M University. We present an iterative classification algorithm to automatically label BBs (i.e., as text or noise) based on their spatial distribution and geometry. The approach uses a rule-base classifier to generate initial text/noise labels for each BB, followed by an iterative classifier that refines the initial labels by incorporating local information to each BB, its spatial location, shape and size. When evaluated on a dataset containing over 72,000 manually-labeled BBs from 159 historical documents, the algorithm can classify BBs with 0.95 precision and 0.96 recall. Further evaluation on a collection of 6,775 documents with ground-truth transcriptions shows that the algorithm can also be used to predict document quality (0.7 correlation) and improve OCR transcriptions in 85% of the cases. This thesis also aims at generating font metadata for historical documents. Knowledge of the font can aid OCR system to produce very accurate text transcriptions, but getting font information for 45 million documents is a daunting task. We present an active learning based font identification system that can classify document images into fonts. In active learning, a learner queries the human for labels on examples it finds most informative. We capture the characteristics of the fonts using word image features related to character width, angled strokes, and Zernike moments. To extract page level features, we use bag-of-word feature (BoF) model. A font classification model trained using BoF and active learning requires only 443 labeled instances to achieve 89.3% test accuracy

Texas A&M Repository

Recommended from our members

Multimodal Indexing of Presentation Videos

Author: Merler Michele
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z

Columbia University Academic Commons

Evaluating automated and hybrid neural disambiguation for African historical named entities

Author: Dunn Jarryd
Publication venue: Department of Statistical Sciences
Publication date: 15/02/2023
Field of study

Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probability-based baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system

Cape Town University OpenUCT

ArchMine: Learning from non-machine-readable documents for additional insights

Author: Mariana Ferreira Dias
Publication venue
Publication date: 17/03/2023
Field of study

Repositório Aberto da Universidade do Porto

An ant colony based model to optimize parameters in industrial vision

Author: Benchikhi Loubna
Elfazziki Aziz
Mansouri Fatimaezzahra
Sadgal Mohamed
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2017
Field of study

Industrial vision constitutes an efficient way to resolve quality control problems. It proposes a wide variety of relevant operators to accomplish controlling tasks in vision systems. However, the installation of these systems awaits for a precise parameter tuning, which remains a very difficult exercise. The manual parameter adjustment can take a lot of time, if precision is expected, by revising many operators. In order to save time and get more precision, a solution is to automate this task by using optimization approaches (mathematical models, population models, learning models...). This paper proposes an Ant Colony Optimization (ACO) based model. The process considers each ant as a potential solution, and then by an interacting mechanism, ants converge to the optimal solution. The proposed model is illustrated by some image processing applications giving very promising results. Compared to other approaches, the proposed one is very hopeful.Industrial vision constitutes an efficient way to resolve quality control problems. It proposes a wide variety of relevant operators to accomplish controlling tasks in vision systems. However, the installation of these systems awaits for a precise parameter tuning, which remains a very difficult exercise. The manual parameter adjustment can take a lot of time, if precision is expected, by revising many operators. In order to save time and get more precision, a solution is to automate this task by using optimization approaches (mathematical models, population models, learning models...). This paper proposes an Ant Colony Optimization (ACO) based model. The process considers each ant as a potential solution, and then by an interacting mechanism, ants converge to the optimal solution. The proposed model is illustrated by some image processing applications giving very promising results. Compared to other approaches, the proposed one is very hopeful.Industrial vision constitutes an efficient way to resolve quality control problems. It proposes a wide variety of relevant operators to accomplish controlling tasks in vision systems. However, the installation of these systems awaits for a precise parameter tuning, which remains a very difficult exercise. The manual parameter adjustment can take a lot of time, if precision is expected, by revising many operators. In order to save time and get more precision, a solution is to automate this task by using optimization approaches (mathematical models, population models, learning models...). This paper proposes an Ant Colony Optimization (ACO) based model. The process considers each ant as a potential solution, and then by an interacting mechanism, ants converge to the optimal solution. The proposed model is illustrated by some image processing applications giving very promising results. Compared to other approaches, the proposed one is very hopeful

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Revistes Catalanes amb Accés Obert

Electronic Letters on Computer Vision and Image Analysis (ELCVIA - Universitat Autònoma de Barcelona)

Diposit Digital de Documents de la UAB

Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification

Author: Zhu Siyu
Publication venue: RIT Scholar Works
Publication date: 12/05/2016
Field of study

An enormous amount of digital images are being generated and stored every day. Understanding text in these images is an important challenge with large impacts for academic, industrial and domestic applications. Recent studies address the difficulty of separating text targets from noise and background, all of which vary greatly in natural scenes. To tackle this problem, we develop a text detection system to analyze and utilize visual information in a data driven, automatic and intelligent way. The proposed method incorporates features learned from data, including patch-based coarse-to-fine detection (Text-Conv), connected component extraction using region growing, and graph-based word segmentation (Word-Graph). Text-Conv is a sliding window-based detector, with convolution masks learned using the Convolutional k-means algorithm (Coates et. al, 2011). Unlike convolutional neural networks (CNNs), a single vector/layer of convolution mask responses are used to classify patches. An initial coarse detection considers both local and neighboring patch responses, followed by refinement using varying aspect ratios and rotations for a smaller local detection window. Different levels of visual detail from ground truth are utilized in each step, first using constraints on bounding box intersections, and then a combination of bounding box and pixel intersections. Combining masks from different Convolutional k-means initializations, e.g., seeded using random vectors and then support vectors improves performance. The Word-Graph algorithm uses contextual information to improve word segmentation and prune false character detections based on visual features and spatial context. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems, and producing highly accurate text detection masks at the pixel level. To investigate the utility of our feature learning approach for other image types, we perform tests on 8- bit greyscale USPTO patent drawing diagram images. An ensemble of Ada-Boost classifiers with different convolutional features (MetaBoost) is used to classify patches as text or background. The Tesseract OCR system is used to recognize characters in detected labels and enhance performance. With appropriate pre-processing and post-processing, f-measures of 82% for part label location, and 73% for valid part label locations and strings are obtained, which are the best obtained to-date for the USPTO patent diagram data set used in our experiments. To sum up, an intelligent refinement of convolutional k-means-based feature learning and novel automatic classification methods are proposed for text detection, which obtain state-of-the-art results without the need for strong prior knowledge. Different ground truth representations along with features including edges, color, shape and spatial relationships are used coherently to improve accuracy. Different variations of feature learning are explored, e.g. support vector-seeded clustering and MetaBoost, with results suggesting that increased diversity in learned features benefit convolution-based text detectors

RIT Scholar Works

Artificial Intelligence based multi-agent control system

Author: Lisi Federico
Publication venue
Publication date: 26/02/2019
Field of study

Le metodologie di Intelligenza Artificiale (AI) si occupano della possibilità di rendere le macchine in grado di compiere azioni intelligenti con lo scopo di aiutare l’essere umano; quindi è possibile affermare che l’Intelligenza Artificiale consente di portare all’interno delle macchine, caratteristiche tipiche considerate come caratteristiche umane. Nello spazio dell’Intelligenza Artificiale ci sono molti compiti che potrebbero essere richiesti alla macchina come la percezione dell’ambiente, la percezione visiva, decisioni complesse. La recente evoluzione in questo campo ha prodotto notevoli scoperte, princi- palmente in sistemi ingegneristici come sistemi multi-agente, sistemi in rete, impianti, sistemi veicolari, sistemi sanitari; infatti una parte dei suddetti sistemi di ingegneria è presente in questa tesi di dottorato. Lo scopo principale di questo lavoro è presentare le mie recenti attività di ricerca nel campo di sistemi complessi che portano le metodologie di intelligenza artifi- ciale ad essere applicati in diversi ambienti, come nelle reti di telecomunicazione, nei sistemi di trasporto e nei sistemi sanitari per la Medicina Personalizzata. Gli approcci progettati e sviluppati nel campo delle reti di telecomunicazione sono presentati nel Capitolo 2, dove un algoritmo di Multi Agent Reinforcement Learning è stato progettato per implementare un approccio model-free al fine di controllare e aumentare il livello di soddisfazione degli utenti; le attività di ricerca nel campo dei sistemi di trasporto sono presentate alla fine del capitolo 2 e nel capitolo 3, in cui i due approcci riguardanti un algoritmo di Reinforcement Learning e un algoritmo di Deep Learning sono stati progettati e sviluppati per far fronte a soluzioni di viaggio personalizzate e all’identificazione automatica dei mezzi trasporto; le ricerche svolte nel campo della Medicina Personalizzata sono state presentate nel Capitolo 4 dove è stato presentato un approccio basato sul controllo Deep Learning e Model Predictive Control per affrontare il problema del controllo dei fattori biologici nei pazienti diabetici.Artificial Intelligence (AI) is a science that deals with the problem of having machines perform intelligent, complex, actions with the aim of helping the human being. It is then possible to assert that Artificial Intelligence permits to bring into machines, typical characteristics and abilities that were once limited to human intervention. In the field of AI there are several tasks that ideally could be delegated to machines, such as environment aware perception, visual perception and complex decisions in the various field. The recent research trends in this field have produced remarkable upgrades mainly on complex engineering systems such as multi-agent systems, networked systems, manufacturing, vehicular and transportation systems, health care; in fact, a portion of the mentioned engineering system is discussed in this PhD thesis, as most of them are typical field of application for traditional control systems. The main purpose if this work is to present my recent research activities in the field of complex systems, bringing artificial intelligent methodologies in different environments such as in telecommunication networks, transportation systems and health care for Personalized Medicine. The designed and developed approaches in the field of telecommunication net- works is presented in Chapter 2, where a multi-agent reinforcement learning algorithm was designed to implement a model-free control approach in order to regulate and improve the level of satisfaction of the users, while the research activities in the field of transportation systems are presented at the end of Chapter 2 and in Chapter 3, where two approaches regarding a Reinforcement Learning algorithm and a Deep Learning algorithm were designed and developed to cope with tailored travels and automatic identification of transportation moralities. Finally, the research activities performed in the field of Personalized Medicine have been presented in Chapter 4 where a Deep Learning and Model Predictive control based approach are presented to address the problem of controlling biological factors in diabetic patients

Archivio della ricerca- Università di Roma La Sapienza

Handwritten Digit Recognition and Classification Using Machine Learning

Author: Zhao Ke
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2018
Field of study

In this paper, multiple learning techniques based on Optical character recognition (OCR) for the handwritten digit recognition are examined, and a new accuracy level for recognition of the MNIST dataset is reported. The proposed framework involves three primary parts, image pre-processing, feature extraction and classification. This study strives to improve the recognition accuracy by more than 99% in handwritten digit recognition. As will be seen, pre-processing and feature extraction play crucial roles in this experiment to reach the highest accuracy

Arrow@TUDublin