7 research outputs found

    Um estudo comparativo das abordagens de detecção e reconhecimento de texto para cenários de computação restrita

    Get PDF
    Orientadores: Ricardo da Silva Torres, Allan da Silva PintoDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Textos são elementos fundamentais para uma efetiva comunicação em nosso cotidiano. A mobilidade de pessoas e veículos em ambientes urbanos e a busca por um produto de interesse em uma prateleira de supermercado são exemplos de atividades em que o entendimento dos elementos textuais presentes no ambiente são essenciais para a execução da tarefa. Recentemente, diversos avanços na área de visão computacional têm sido reportados na literatura, com o desenvolvimento de algoritmos e métodos que objetivam reconhecer objetos e textos em cenas. Entretanto, a detecção e reconhecimento de textos são problemas considerados em aberto devido a diversos fatores que atuam como fontes de variabilidades durante a geração e captura de textos em cenas, o que podem impactar as taxas de detecção e reconhecimento de maneira significativa. Exemplo destes fatores incluem diferentes formas dos elementos textuais (e.g., circular ou em linha curva), estilos e tamanhos da fonte, textura, cor, variação de brilho e contraste, entre outros. Além disso, os recentes métodos considerados estado-da-arte, baseados em aprendizagem profunda, demandam altos custos de processamento computacional, o que dificulta a utilização de tais métodos em cenários de computação restritiva. Esta dissertação apresenta um estudo comparativo de técnicas de detecção e reconhecimento de texto, considerando tanto os métodos baseados em aprendizado profundo quanto os métodos que utilizam algoritmos clássicos de aprendizado de máquina. Esta dissertação também apresenta um método de fusão de caixas delimitadoras, baseado em programação genética (GP), desenvolvido para atuar tanto como uma etapa de pós-processamento, posterior a etapa de detecção, quanto para explorar a complementariedade dos algoritmos de detecção de texto investigados nesta dissertação. De acordo com o estudo comparativo apresentado neste trabalho, os métodos baseados em aprendizagem profunda são mais eficazes e menos eficientes, em comparação com os métodos clássicos da literatura e considerando as métricas adotadas. Além disso, o algoritmo de fusão proposto foi capaz de aprender informações complementares entre os métodos investigados nesta dissertação, o que resultou em uma melhora das taxas de precisão e revocação. Os experimentos foram conduzidos considerando os problemas de detecção de textos horizontais, verticais e de orientação arbitráriaAbstract: Texts are fundamental elements for effective communication in our daily lives. The mobility of people and vehicles in urban environments and the search for a product of interest on a supermarket shelf are examples of activities in which the understanding of the textual elements present in the environment is essential to succeed in such tasks. Recently, several advances in computer vision have been reported in the literature, with the development of algorithms and methods that aim to recognize objects and texts in scenes. However, text detection and recognition are still open problems due to several factors that act as sources of variability during scene text generation and capture, which can significantly impact detection and recognition rates of current algorithms. Examples of these factors include different shapes of textual elements (e.g., circular or curved), font styles and sizes, texture, color, brightness and contrast variation, among others. Besides, recent state-of-the-art methods based on deep learning demand high computational processing costs, which difficult their use in restricted computing scenarios. This dissertation presents a comparative study of text detection and recognition techniques, considering methods based on deep learning and methods that use classical machine learning algorithms. This dissertation also presents an algorithm for fusing bounding boxes, based on genetic programming (GP), developed to act as a post-processing step for a single text detector and to explore the complementarity of text detection algorithms investigated in this dissertation. According to the comparative study presented in this work, the methods based on deep learning are more effective and less efficient, in comparison to classic methods for text detection investigated in this work, considering the adopted metrics. Furthermore, the proposed GP-based fusion algorithm was able to learn complementary information from the methods investigated in this dissertation, which resulted in an improvement of precision and recall rates. The experiments were conducted considering text detection problems involving horizontal, vertical and arbitrary orientationsMestradoCiência da ComputaçãoMestre em Ciência da ComputaçãoCAPE

    Deep Learning for Scene Text Detection, Recognition, and Understanding

    Get PDF
    Detecting and recognizing texts in images is a long-standing task in computer vision. The goal of this task is to extract textual information from images and videos, such as recognizing license plates. Despite that the great progresses have been made in recent years, it still remains challenging due to the wide range of variations in text appearance. In this thesis, we aim to review the existing issues that hinder current Optical Character Recognition (OCR) development and explore potential solutions. Specifically, we first investigate the phenomenon of unfair comparisons between different OCR algorithms caused due to the lack of a consistent evaluation framework. Such an absence of a unified evaluation protocol leads to inconsistent and unreliable results, making it difficult to compare and improve upon existing methods. To tackle this issue, we design a new evaluation framework from the aspect of datasets, metrics, and models, enabling consistent and fair comparisons between OCR systems. Another issue existing in the field is the imbalanced distribution of training samples. In particular, the sample distribution largely depended on where and how the data was collected, and the resulting data bias may lead to poor performance and low generalizability on under-represented classes. To address this problem, we took the driving license plate recognition task as an example and proposed a text-to-image model that is able to synthesize photo-realistic text samples. By using this model, we synthesized more than one million samples to augment the training dataset, significantly improving the generalization capability of OCR models. Additionally, this thesis also explores the application of text vision question answering, which is a new and emerging research topic among the OCR community. This task challenges the OCR models to understand the relationships between the text and backgrounds and to answer the given questions. In this thesis, we propose to investigate evidence-based text VQA, which involves designing models that can provide reasonable evidence for their predictions, thus improving the generalization ability.Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 202
    corecore