6 research outputs found

    Evaluating Object and Text Detectors under the Binary Classification Scenario: A Review

    Get PDF
    With the explosively increasing volume of hateful speech presented with images on the Internet, it is necessary to detect hateful speech automatically. Due to the intense demand for computation from the hateful meme detection pipeline, it is vital to classify the text and non-text images for accelerating the speed of the multimodal hateful speech system. This study reviews the recent development of object and text detection architectures and categorizes them into one-stage or two-stage detectors to better compare accuracy and efficiency. Additionally, this study proposes two datasets as the benchmarks for the binary classification scenario to evaluate two representative object detectors and two state-of-art text detectors on the customized datasets with two types of texts embedded in images. The results indicate that one-stage detectors may not necessarily achieve higher throughputs than two-stage detectors, and the performance of detectors varies depending on the type of image texts. This thesis can contribute to further evaluation of detectors in binary detection tasks

    Deep Learning for Scene Text Detection, Recognition, and Understanding

    Get PDF
    Detecting and recognizing texts in images is a long-standing task in computer vision. The goal of this task is to extract textual information from images and videos, such as recognizing license plates. Despite that the great progresses have been made in recent years, it still remains challenging due to the wide range of variations in text appearance. In this thesis, we aim to review the existing issues that hinder current Optical Character Recognition (OCR) development and explore potential solutions. Specifically, we first investigate the phenomenon of unfair comparisons between different OCR algorithms caused due to the lack of a consistent evaluation framework. Such an absence of a unified evaluation protocol leads to inconsistent and unreliable results, making it difficult to compare and improve upon existing methods. To tackle this issue, we design a new evaluation framework from the aspect of datasets, metrics, and models, enabling consistent and fair comparisons between OCR systems. Another issue existing in the field is the imbalanced distribution of training samples. In particular, the sample distribution largely depended on where and how the data was collected, and the resulting data bias may lead to poor performance and low generalizability on under-represented classes. To address this problem, we took the driving license plate recognition task as an example and proposed a text-to-image model that is able to synthesize photo-realistic text samples. By using this model, we synthesized more than one million samples to augment the training dataset, significantly improving the generalization capability of OCR models. Additionally, this thesis also explores the application of text vision question answering, which is a new and emerging research topic among the OCR community. This task challenges the OCR models to understand the relationships between the text and backgrounds and to answer the given questions. In this thesis, we propose to investigate evidence-based text VQA, which involves designing models that can provide reasonable evidence for their predictions, thus improving the generalization ability.Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 202
    corecore