Search CORE

13 research outputs found

Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

Author: Majid Nishatul
Publication venue: 'IUScholarWorks'
Publication date: 01/08/2020
Field of study

This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

Boise State University - ScholarWorks

Modeling and training options for handwritten Arabic text recognition

Author: Ahmad Irfan
Publication venue
Publication date: 01/01/2016
Field of study

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Deep learning based semantic textual similarity for applications in translation technology

Author: Ranasinghe Tharindu
Publication venue: University of Wolverhampton
Publication date: 01/01/2021
Field of study

A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Semantic Textual Similarity (STS) measures the equivalence of meanings between two textual segments. It is a fundamental task for many natural language processing applications. In this study, we focus on employing STS in the context of translation technology. We start by developing models to estimate STS. We propose a new unsupervised vector aggregation-based STS method which relies on contextual word embeddings. We also propose a novel Siamese neural network based on efficient recurrent neural network units. We empirically evaluate various unsupervised and supervised STS methods, including these newly proposed methods in three different English STS datasets, two non- English datasets and a bio-medical STS dataset to list the best supervised and unsupervised STS methods. We then embed these STS methods in translation technology applications. Firstly we experiment with Translation Memory (TM) systems. We propose a novel TM matching and retrieval method based on STS methods that outperform current TM systems. We then utilise the developed STS architectures in translation Quality Estimation (QE). We show that the proposed methods are simple but outperform complex QE architectures and improve the state-of-theart results. The implementations of these methods have been released as open source

Wolverhampton Intellectual Repository and E-theses

Recommended from our members

The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction

Author: Stahlberg Felix
Publication venue: University of Cambridge
Publication date: 17/02/2020
Field of study

With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand. At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1 EPSRC Tier-2 capital grant EP/P020259/

Apollo (Cambridge)

SENTIMENT ANALYSIS FOR SPORTS FANATICISM IN ARABIC SOCIAL MEDIA TEXT

Author
Publication venue
Publication date
Field of study

KFUPM ePrints

An Automatic Modern Standard Arabic Text Simplification System: A Corpus-Based Approach

Author: Khallaf Nouran Abdelrahman Ahmed
Publication venue
Publication date: 01/03/2023
Field of study

This thesis brings together an overview of Text Readability (TR) about Text Simplification (TS) with an application of both to Modern Standard Arabic (MSA). It will present our findings on using automatic TR and TS tools to teach MSA, along with challenges, limitations, and recommendations about enhancing the TR and TS models. Reading is one of the most vital tasks that provide language input for communication and comprehension skills. It is proved that the use of long sentences, connected sentences, embedded phrases, passive voices, non- standard word orders, and infrequent words can increase the text difficulty for people with low literacy levels, as well as second language learners. The thesis compares the use of sentence embeddings of different types (fastText, mBERT, XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. The accuracy of the 3-way CEFR (The Common European Framework of Reference for Languages Proficiency Levels) classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification, respectively and 0.71 Spearman correlation for the regression task. At the same time, the binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for the sentence-pair semantic similarity classifier. TS is an NLP task aiming to reduce the linguistic complexity of the text while maintaining its meaning and original information (Siddharthan, 2002; Camacho Collados, 2013; Saggion, 2017). The simplification study experimented using two approaches: (i) a classification approach and (ii) a generative approach. It then evaluated the effectiveness of these methods using the BERTScore (Zhang et al., 2020) evaluation metric. The simple sentences produced by the mT5 model achieved P 0.72, R 0.68 and F-1 0.70 via BERTScore while combining Arabic- BERT and fastText achieved P 0.97, R 0.97 and F-1 0.97. To reiterate, this research demonstrated the effectiveness of the implementation of a corpus-based method combined with extracting extensive linguistic features via the latest NLP techniques. It provided insights which can be of use in various Arabic corpus studies and NLP tasks such as translation for educational purposes

White Rose E-theses Online

The role of nursing in multimorbidity care

Author: McParland Christopher Robert
Publication venue
Publication date: 01/01/2024
Field of study

Background Multimorbidity (the co-occurrence of two or more chronic conditions in the same person) affects around one in three persons, and it is strongly associated with a range of negative outcomes including worsening physical function, increased health care use, and premature death. Due to the way healthcare is provided to people with multimorbidity, treatment can become burdensome, fragmented and inefficient. In people with palliative conditions, multimorbidity is increasingly common. Better models of care are needed. Methods A mixed-methods programme of research designed to inform the development of a nurse-led intervention for people with multimorbidity and palliative conditions. A mixed-methods systematic review explored nurse-led interventions for multimorbidity and their effects on outcomes. A cross-sectional study of 63,328 emergency department attenders explored the association between multimorbidity, complex multimorbidity (≥3 conditions affecting ≥3 body systems), and disease-burden on healthcare use and inpatient mortality. A focussed ethnographic study of people with multimorbidity and life-limiting conditions and their carers (n=12) explored the concept of treatment burden. Findings Nurse-led interventions for people with multimorbidity generally focus on care coordination (i.e., case management or transitional care); patients view them positively, but they do not reliably reduce health care use or costs. Multimorbidity and complex multimorbidity were significantly associated with admission from the emergency department and reattendance within 30 and 90 days. The association was greater in those with more conditions. There was no association with inpatient mortality. People with multimorbidity and palliative conditions experienced treatment burden in a manner consistent with existing theoretical models. This thesis also noted the effect of uncertainty on the balance between capacity and workload and proposes a model of how these concepts relate to one another. Discussion This thesis addresses a gap in what is known about the role of nurses in providing care to the growing number of people with multimorbidity. A theory-based nurse-led intervention is proposed which prioritises managing treatment burden and uncertainty. Conclusions Nursing in an age of multimorbidity necessitates a perspective shift which conceptualises chronic conditions as multiple overlapping phenomena situated within an individual. The role of the nurse should be to help patients navigate the complexity of living with multiple chronic conditions

Glasgow Theses Service

Learning to represent, categorise and rank in community question answering

Author: Bogdanova Daria
Publication venue: Dublin City University. School of Computing
Publication date: 01/01/2018
Field of study

The task of Question Answering (QA) is arguably one of the oldest tasks in Natural Language Processing, attracting high levels of interest from both industry and academia. However, most research has focused on factoid questions, e.g. Who is the president of Ireland? In contrast, research on answering non-factoid questions, such as manner, reason, difference and opinion questions, has been rather piecemeal. This was largely due to the absence of available labelled data for the task. This is changing, however, with the growing popularity of Community Question Answering (CQA) websites, such as Quora, Yahoo! Answers and the Stack Exchange family of forums. These websites provide natural labelled data allowing us to apply machine learning techniques. Most previous state-of-the-art approaches to the tasks of CQA-based question answering involved handcrafted features in combination with linear models. In this thesis we hypothesise that the use of handcrafted features can be avoided and the tasks can be approached with representation learning techniques, specifically deep learning. In the first part of this thesis we give an overview of deep learning in natural language processing and empirically evaluate our hypothesis on the task of detecting semantically equivalent questions, i.e. predicting if two questions can be answered by the same answer. In the second part of the thesis we address the task of answer ranking, i.e. determining how suitable an answer is for a given question. In order to determine the suitability of representation learning for the task of answer ranking, we provide a rigorous experimental evaluation of various neural architectures, based on feedforward, recurrent and convolutional neural networks, as well as their combinations. This thesis shows that deep learning is a very suitable approach to CQA-based QA, achieving state-of-the-art results on the two tasks we addressed

Irish Universities

DCU Online Research Access Service