14 research outputs found

    Writer adaptation for offline text recognition: An exploration of neural network-based methods

    Full text link
    Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance.Comment: 21 pages including appendices, 6 figures, 10 table

    Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation

    Full text link
    This paper presents a comprehensive evaluation of the Optical Character Recognition (OCR) capabilities of the recently released GPT-4V(ision), a Large Multimodal Model (LMM). We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document. The evaluation reveals that GPT-4V performs well in recognizing and understanding Latin contents, but struggles with multilingual scenarios and complex tasks. Specifically, it showed limitations when dealing with non-Latin languages and complex tasks such as handwriting mathematical expression recognition, table structure recognition, and end-to-end semantic entity recognition and pair extraction from document image. Based on these observations, we affirm the necessity and continued research value of specialized OCR models. In general, despite its versatility in handling diverse OCR tasks, GPT-4V does not outperform existing state-of-the-art OCR models. How to fully utilize pre-trained general-purpose LMMs such as GPT-4V for OCR downstream tasks remains an open problem. The study offers a critical reference for future research in OCR with LMMs. Evaluation pipeline and results are available at https://github.com/SCUT-DLVCLab/GPT-4V_OCR

    Noninvasive Dynamic Characterization of Swallowing Kinematics and Impairments in High Resolution Cervical Auscultation via Deep Learning

    Get PDF
    Swallowing is a complex sensorimotor activity by which food and liquids are transferred from the oral cavity to the stomach. Swallowing requires the coordination between multiple subsystems which makes it subject to impairment secondary to a variety of medical or surgically related conditions. Dysphagia refers to any swallowing disorder and is common in patients with head and neck cancer and neurological conditions such as stroke. Dysphagia affects nearly 9 million adults and causes death for more than 60,000 yearly in the US. In this research, we utilize advanced signal processing techniques with sensor technology and deep learning methods to develop a noninvasive and widely available tool for the evaluation and diagnosis of swallowing problems. We investigate the use of modern spectral estimation methods in addition to convolutional recurrent neural networks to demarcate and localize the important swallowing physiological events that contribute to airway protection solely based on signals collected from non-invasive sensors attached to the anterior neck. These events include the full swallowing activity, upper esophageal sphincter opening duration and maximal opening diameter, and aspiration. We believe that combining sensor technology and state of the art deep learning architectures specialized in time series analysis, will help achieve great advances for dysphagia detection and management in terms of non-invasiveness, portability, and availability. Like never before, such advances will enable patients to get continuous feedback about their swallowing out of standard clinical care setting which will extremely facilitate their daily activities and enhance the quality of their lives

    Інтелектуальна система розпізнавання образів на основі згорткових нейронних мереж

    Get PDF
    Магістерська дисертація на здобуття ступеня «магістр» за освітньо-науковою програмою підготовки «Інтегровані інформаційні системи» на тему «Інтелектуальна система розпізнавання образів на основі згорткових нейронних мереж». Дисертація містить 102 сторінки, 54 рисунки, 3 додатки, 26 джерел. Актуальність. Підвищення точності розпізнавання графічних образів комп’ютером є актуальною темою для побудови сучасних інформаційних систем. Метою магістерської дисертації є підвищення ефективності систем розпізнавання графічних образів, вдосконалення технології комп’ютерного зору. Об`єкт дослідження: графічний образ. Предмет дослідження: інтелектуальна система розпізнавання графічних образів на основі згорткових нейронних мереж. Наукова новизна полягає у підвищенні ефективності розпізнавання графічних образів інтелектуальними системами, а саме – у поєднанні методів попередньої обробки зображення та мінімізації помилки системи. Публікація результатів дисертації. За результатами роботи було опубліковано наукові статті: Ткаченко М. С., Сокульський О.Є. Застосування R-CNN при автоматичному позиціонуванні об’єктів через нейромережевий аналіз графічних даних. Ткаченко М. С., Сокульський О.Є. Принципи організації процедури машинного аналізу на основі згорткової нейромережевої архітектури.Master's dissertation for the degree of "master" in the educational program "Integrated Information Systems" on the topic "Intelligent image recognition system based on convolutional neural networks." The dissertation contains 102 pages, 54 figures, 3 appendices, 26 sources. Topicality. Improving the accuracy of computer image recognition is an important topic for building modern information systems. The aim is to improve the efficiency of graphic recognition systems, and enhance computer vision technology. The object of study - graphic image. Purpose of the study - intelligent graphic image recognition system based on convolutional neural networks. Scientific novelty is to increase the efficiency of graphic image recognition by intelligent systems, namely - in a combination of image pre-processing methods and minimize system error Publication of dissertation results. Based on the results of the work, an articles were published: Tkachenko M. Sokylskyi O. Usage of R-CNN in automatic positioning of objects through neural network analysis of graphic data. Tkachenko M. Sokylskyi O. Principles of organization of machine analysis procedure based on convolutional neural network architecture

    Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-­‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-­‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Orthographic practices in SMS text messaging as a case signifying diachronic change in linguistic and semiotic resources

    Get PDF
    From 1998, SMS text messaging diffused in the UK from an innovation associated with a small minority, mainly adolescents, to a method of written communication practised routinely by people of all ages and social profiles. From its earliest use, and continuing to the time of writing in 2015, SMS texting has attracted strong evaluation in public sphere commentary, often focused on its spelling. This thesis presents analysis of SMS orthographic choice as practised by a sample of adolescents and young adults in England, with data collected between 2000 and 2012. A threelevel analytical framework attends to the textual evidence of SMS orthographic practices in situated use; respondents’ accounts of their choices of spelling in text messaging as a literacy practice; and the metadiscursive evaluation of text messaging spelling in situated interaction and in the public sphere. I present analysis of a variety of representations of SMS orthographic choice, including facsimile texts, electronic corpus data, questionnaire survey responses and transcripts of recorded interviews. This mixed methods empirical approach enables a cross-verified, longitudinal perspective on respondents’ practices, and on the wider significance of SMS orthographic choice, as expressed in private and public commentary. I argue that the spelling used in SMS exemplifies features, patterns, and behaviours, which are found in other forms of digitally-mediated interaction, and in previous and concurrent vernacular literacy practices. I present SMS text messaging as one of the intertextually-related forms of self-published written interaction which mark a diachronic shift towards re-regulated forms of orthographic convention, so disrupting attitudes to standard English spelling. I consider some implications represented by SMS spelling choice for the future of written conventions in standardised English, and for teaching and learning about spelling and literacy in formal educational settings