Search CORE

21 research outputs found

Arabic cursive text recognition from natural scene images

Author: Ahmed SB
Naz S
Razzak MI
Yusof R
Publication venue: 'MDPI AG'
Publication date: 10/01/2019
Field of study

© 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers

OPUS - University of Technology Sydney

RECOGNITION OF CHARACTER FROM VIDEO SUBTITLES

Author: H Venkatesh
Publication venue: 'Whioce Publishing Pte Ltd'
Publication date: 04/09/2018
Field of study

An important task in content based video indexing is to extract text information from videos. The challenges involved in text extraction and recognition are variation of illumination on each video frame with text, the text present on the complex background and different font size of the text. Using various image processing algorithms like morphological operations, blob detection and histogram of oriented gradients the character recognition of video subtitles is implemented. Segmentation, feature extraction and classification are the major steps of character recognition. Several experimental results are shown to demonstrate the performance of the proposed algorithm

Whioce Journals

Machine learning for ancient languages: a survey

Author: Androutsopoulos Ion
Assael Yannis
Bodel John
Dyer Chris
Freitas Nando de
Pavlopoulos John
Prag Jonathan
Senior Andrew
Sommerschield Thea
Stefanak Vanessa
Publication venue: MIT Press
Publication date: 10/08/2023
Field of study

Ancient languages preserve the cultures and histories of the past. However, their study is fraught with difficulties, and experts must tackle a range of challenging text-based tasks, from deciphering lost languages to restoring damaged inscriptions, to determining the authorship of works of literature. Technological aids have long supported the study of ancient texts, but in recent years advances in artificial intelligence and machine learning have enabled analyses on a scale and in a detail that are reshaping the field of humanities, similarly to how microscopes and telescopes have contributed to the realm of science. This article aims to provide a comprehensive survey of published research using machine learning for the study of ancient texts written in any language, script, and medium, spanning over three and a half millennia of civilizations around the ancient world. To analyze the relevant literature, we introduce a taxonomy of tasks inspired by the steps involved in the study of ancient documents: digitization, restoration, attribution, linguistic analysis, textual criticism, translation, and decipherment. This work offers three major contributions: first, mapping the interdisciplinary field carved out by the synergy between the humanities and machine learning; second, highlighting how active collaboration between specialists from both fields is key to producing impactful and compelling scholarship; third, highlighting promising directions for future work in this field. Thus, this work promotes and supports the continued collaborative impetus between the humanities and machine learning

Oxford University Research Archive

Multi-script handwritten character recognition:Using feature descriptors and machine learning

Author: Surinta Olarik
Publication venue: 'University of Groningen Press'
Publication date: 01/01/2016
Field of study

University of Groningen

Multi-script text versus non-text classification of regions in scene images

Author: Schomaker Lambert
Sriman Bowornrat
Publication venue: 'Elsevier BV'
Publication date: 01/07/2019
Field of study

Text versus non-text region classification is an essential but difficult step in scene-image analysis due to the considerable shape complexity of text and background patterns. There exists a high probability of confusion between background elements and letter parts. This paper proposes a feature-based classification of image blocks using the color autocorrelation histogram (CAH) and the scale-invariant feature transform (SIFT) algorithm, yielding a combined scale and color-invariant feature suitable for scene-text classification. For the evaluation, features were extracted from different color spaces, applying color-histogram autocorrelation. The color features are adjoined with a SIFT descriptor. Parameter tuning is performed and evaluated. For the classification, a standard nearest-neighbor (1NN) and a support-vector machine (SVM) were compared. The proposed method appears to perform robustly and is especially suitable for Asian scripts such as Kannada and Thai, where urban scene-text fonts are characterized by a high curvature and salient color variations

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Advanced document data extraction techniques to improve supply chain performance

Author: Sharma Vikash
Publication venue
Publication date: 01/07/2021
Field of study

In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

Repository@Hull - Worktribe

Recommended from our members

Consequences of bi-literacy in bilingual individuals: in the healthy and neurologically impaired

Author: Balasubramanian Anusha
Publication venue
Publication date: 29/03/2019
Field of study

Background. In the current global, cross-cultural scenario, being bilingual or multilingual is a norm rather than an exception. In such an environment an individual may be actively involved in reading and writing in all their languages in addition to speaking them. Regular use of two or more languages is termed as bilingualism and being able to read and write in both of them is referred to as bi-literacy. Research indicates that bilingualism has an impact on language production and cognition, specifically executive functions. Given the impact of literacy and bilingualism, the reasonable question that arises, is whether bi-literacy would offer an additional impact on language production and cognition. This becomes even more relevant in a multilingual, multi-cultural society such as India. We examined the impact of bi-literacy on oral language production (at word and connected speech level), comprehension and on non-verbal executive function measures in bi-literate bilingual healthy adults in an immigrant diaspora living in the UK. In addition to English, they were speakers of one of the South Indian languages (Kannada, Malayalam, Tamil and Telugu). The significance of bi-literacy among bilinguals assumes further importance in aphasia (language impairment due to brain damage). For those who have aphasia in one or more languages due to brain damage, the severity of impairment maybe different in both languages, also the modalities of language may be differentially affected. In particular, reading and writing maybe impaired differently in the languages used by a bi/multilingual. Manifestation of reading impairments are also dependent on the nature of the script of the language being read [e.g., Raman & Weekes (2005) report differential dyslexia in a Turkish-English speaker who exhibited surface dyslexia in English and deep dysgraphia in Turkish]. Our study contributes to the field of bilingual aphasia by focusing specifically on reading differing from the existing literature of aphasia in bilinguals, where the focus has predominantly been on language production and comprehension. Studying reading impairments provides a better understanding of how the reading impairments are manifested in the two languages, which will aid appropriate assessment and intervention. This research investigated the impact of bi-literacy in both populations (healthy adults and neurologically impaired) in two phases: Phase I (in UK) and Phase II (in India). Aim. Phase I investigated the impact of bi-literacy on oral language production (at word level and connected speech), comprehension and non-verbal executive function in bi-literate bilingual healthy adults. Phase II examined the reading impairments in two languages of bilingual persons with aphasia (BPWA). Methods. For Phase I, participants were thirty-four bi-literate bilingual healthy adults with English as their L2 and one of the Dravidian languages (Kannada, Malayalam, Tamil and Telugu) as their L1. We have used the term ‘print exposure’ as a proxy for literacy. They were divided into a high print exposure (HPE, n=22) and a low print exposure (LPE, n=12) group based on their performance on two tasks measuring L2 print exposure- grammaticality judgement task and sentence verification task. We also quantified their bilingual characteristics- proficiency, reading and writing characteristics and dominance. The groups were matched on years of education, age and gender. Participants completed a set of oral language production tasks in L2 (at word level) namely -verbal fluency, word and non-word repetition; comprehension tasks in L2 namely synonymy triplets task and sentence comprehension task (Chapter 2); oral narrative task in L2 (at connected speech level) (Chapter 3) followed by non-verbal executive function tasks tapping into inhibitory control (Spatial Stroop and Flanker tasks), working memory (visual n-back and auditory n-back) and task switching (colour-shape task) (Chapter 4). For Phase II, we characterized the reading abilities of four BPWA who spoke one of the Dravidian languages (Kannada, Tamil, Telugu) (alpha-syllabic) as their L1 and English (alphabetic) as their L2. We quantified their bilingual characteristics- proficiency, reading and writing characteristics and dominance. Subtests from the Psycholinguistic Assessment of Language Processing in Aphasia (PALPA; Kay, Lesser & Coltheart, 1992) were used to document the reading profile of BPWA in English and reading subtests from Reading Acquisition Profile (RAP-K; Rao, 1997) and words from Bilingual Aphasia test -Hindi (BAT; Paradis & Libben, 1987) were used to document the reading profile of BPWA in Kannada and Hindi respectively. Findings. Based on the findings of Phase I (i.e., results from Chapter 2-4), we found prominent differences between HPE and LPE on comprehension measures (synonymy triplets and sentence comprehension tasks). This is in contrast to the results observed in monolingual adults, were semantics is less impacted by print exposure. Moreover, our predictions that HPE will result in better oral language production skills were borne out in specific conditions-semantic fluency and non-word repetition task (at word level) and higher number of words in the narrative, higher verbs per utterance and fewer repetitions (at connected speech level). In addition, the non-verbal executive functions, we found no direct link between print exposure (in L2) and non-verbal executive functions in bi-literate bilinguals excepting working memory (auditory N-back task). Additionally, another consistency in our findings is that there seems to be a strong link between print exposure and semantic processing in our research. The findings on the semantic tasks have been consistent across comprehension (synonymy triplets task and sentence comprehension task) and production (semantic fluency) favouring HPE. The findings from Phase II (Chapter 5) reveal differences of reading characteristics in the two languages (with different scripts) of the four BPWA. This research provides preliminary evidence that a script related difference exists in the manifestation of dyslexia in bi-scriptal BPWA speaking a combination of alphabetic and alpha-syllabic languages. Conclusions. Our research contributes to the existing literature by highlighting the relationship between bi-literacy and language production, comprehension and non-verbal cognition where bi-literacy seems to have a higher impact on language than cognition. The contrary findings from the monolinguals and children literature, highlight the importance for considering nuances of bilingual research and specifically challenges the notion that semantic comprehension is not significantly affected by literacy. In the neurologically impaired population, our research provides a comprehensive profiling of reading abilities in BPWA in the Indian population with languages having different scripts. Using this profiling and classification, we are able to affirm the findings previously found in literature emphasizing the importance of script in the assessment of reading abilities in BPWA. Such profiling and classification assist in the development of bilingual models of reading aloud and classifying different types of reading impairments

Central Archive at the University of Reading

Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

Author: Majid Nishatul
Publication venue: 'IUScholarWorks'
Publication date: 01/08/2020
Field of study

This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

Boise State University - ScholarWorks

Text detection and recognition in natural images using computer vision techniques

Author: González Arroyo Álvaro
Publication venue
Publication date: 01/01/2013
Field of study

El reconocimiento de texto en imágenes reales ha centrado la atención de muchos investigadores en todo el mundo en los últimos años. El motivo es el incremento de productos de bajo coste como teléfonos móviles o Tablet PCs que incorporan dispositivos de captura de imágenes y altas capacidades de procesamiento. Con estos antecedentes, esta tesis presenta un método robusto para detectar, localizar y reconocer texto horizontal en imágenes diurnas tomadas en escenarios reales. El reto es complejo dada la enorme variabilidad de los textos existentes y de las condiciones de captura en entornos reales. Inicialmente se presenta una revisión de los principales trabajos de los últimos años en el campo del reconocimiento de texto en imágenes naturales. Seguidamente, se lleva a cabo un estudio de las características más adecuadas para describir texto respecto de objetos no correspondientes con texto. Típicamente, un sistema de reconocimiento de texto en imágenes está formado por dos grandes etapas. La primera consiste en detectar si existe texto en la imagen y de localizarlo con la mayor precisión posible, minimizando la cantidad de texto no detectado así como el número de falsos positivos. La segunda etapa consiste en reconocer el texto extraído. El método de detección aquí propuesto está basado en análisis de componentes conexos tras aplicar una segmentación que combina un método global como MSER con un método local, de forma que se mejoran las propuestas del estado del arte al segmentar texto incluso en situaciones complejas como imágenes borrosas o de muy baja resolución. El proceso de análisis de los componentes conexos extraídos se optimiza mediante algoritmos genéticos. Al contrario que otros sistemas, nosotros proponemos un método recursivo que permite restaurar aquellos objetos correspondientes con texto y que inicialmente son erróneamente descartados. De esta forma, se consigue mejorar en gran medida la fiabilidad de la detección. Aunque el método propuesto está basado en análisis de componentes conexos, en esta tesis se utiliza también la idea de los métodos basados en texturas para validar las áreas de texto detectadas. Por otro lado, nuestro método para reconocer texto se basa en identificar cada caracter y aplicar posteriormente un modelo de lenguaje para corregir las palabras mal reconocidas, al restringir la solución a un diccionario que contiene el conjunto de posibles términos. Se propone una nueva característica para reconocer los caracteres, a la que hemos dado el nombre de Direction Histogram (DH). Se basa en calcular el histograma de las direcciones del gradiente en los pixeles de borde. Esta característica se compara con otras del estado del arte y los resultados experimentales obtenidos sobre una base de datos compleja muestran que nuestra propuesta es adecuada ya que supera otros trabajos del estado del arte. Presentamos también un método de clasificación borrosa de letras basado en KNN, el cual permite separar caracteres erróneamente conectados durante la etapa de segmentación. El método de reconocimiento de texto propuesto no es solo capaz de reconocer palabras, sino también números y signos de puntuación. El reconocimiento de palabras se lleva a cabo mediante un modelo de lenguaje basado en inferencia probabilística y el British National Corpus, un completo diccionario del inglés británico moderno, si bien el algoritmo puede ser fácilmente adaptado para ser usado con cualquier otro diccionario. El modelo de lenguaje utiliza una modificación del algoritmo forward usando en Modelos Ocultos de Markov. Para comprobar el rendimiento del sistema propuesto, se han obtenido resultados experimentales con distintas bases de datos, las cuales incluyen imágenes en diferentes escenarios y situaciones. Estas bases de datos han sido usadas como banco de pruebas en la última década por la mayoría de investigadores en el área de reconocimiento de texto en imágenes naturales. Los resultados muestran que el sistema propuesto logra un rendimiento similar al del estado del arte en términos de localización, mientras que lo supera en términos de reconocimiento. Con objeto de mostrar la aplicabilidad del método propuesto en esta tesis, se presenta también un sistema de detección y reconocimiento de la información contenida en paneles de tráfico basado en el algoritmo desarrollado. El objetivo de esta aplicación es la creación automática de inventarios de paneles de tráfico de países o regiones que faciliten el mantenimiento de la señalización vertical de las carreteras, usando imágenes disponibles en el servicio Street View de Google. Se ha creado una base de datos para esta aplicación. Proponemos modelar los paneles de tráfico usando apariencia visual en lugar de las clásicas soluciones que utilizan bordes o características geométricas, con objeto de detectar aquellas imágenes en las que existen paneles de tráfico. Los resultados experimentales muestran la viabilidad del sistema propuesto

e_Buah - Biblioteca Digital de la Universidad de Alcalá

Text detection and recognition in natural images using computer vision techniques

Author: González Arroyo Álvaro
Publication venue
Publication date: 01/01/2013
Field of study

e_Buah - Biblioteca Digital de la Universidad de Alcalá

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblioteca Digital de la Universidad de Alcalá