Search CORE

463 research outputs found

Grammatical Error Correction: A Survey of the State of the Art

Author: Briscoe Ted
Bryant Christopher
Cao Hannan
Ng Hwee Tou
Qorib Muhammad Reza
Yuan Zheng
Publication venue
Publication date: 25/03/2023
Field of study

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments

arXiv.org e-Print Archive

From narrative descriptions to MedDRA: automagically encoding adverse drug reactions

Author: Arzenton Elena
Combi Carlo
Moretti Ugo
Pozzani Gabriele
Zorzi Margherita
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

The collection of narrative spontaneous reports is an irreplaceable source for the prompt detection of suspected adverse drug reactions (ADRs). In such task qualified domain experts manually revise a huge amount of narrative descriptions and then encode texts according to MedDRA standard terminology. The manual annotation of narrative documents with medical terminology is a subtle and expensive task, since the number of reports is growing up day-by-day. Natural Language Processing (NLP) applications can support the work of people responsible for pharmacovigilance. Our objective is to develop NLP algorithms and tools for the detection of ADR clinical terminology. Efficient applications can concretely improve the quality of the experts\u2019 revisions. NLP software can quickly analyze narrative texts and offer an encoding (i.e., a list of MedDRA terms) that the expert has to revise and validate. MagiCoder, an NLP algorithm, is proposed for the automatic encoding of free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in terms of computational complexity. We tested MagiCoder through several experiments. In the first one, we tested it on a large dataset of about 4500 manually revised reports, by performing an automated comparison between human and MagiCoder encoding. Moreover, we tested MagiCoder on a set of about 1800 reports, manually revised ex novo by some experts of the domain, who also compared automatic solutions with the gold reference standard. We also provide two initial experiments with reports written in English, giving a first evidence of the robustness of MagiCoder w.r.t. the change of the language. For the current base version of MagiCoder, we measured an average recall and precision of and , respectively. From a practical point of view, MagiCoder reduces the time required for encoding ADR reports. Pharmacologists have only to review and validate the MedDRA terms proposed by the application, instead of choosing the right terms among the 70\u202fK low level terms of MedDRA. Such improvement in the efficiency of pharmacologists\u2019 work has a relevant impact also on the quality of the subsequent data analysis. We developed MagiCoder for the Italian pharmacovigilance language. However, our proposal is based on a general approach, not depending on the considered language nor the term dictionary

Catalogo dei prodotti della ricerca

Decoding speech comprehension from continuous EEG recordings

Author: Weissbart Hugo
Publication venue: Bioengineering, Imperial College London
Publication date: 01/06/2020
Field of study

Human language is a remarkable manifestation of our cognitive abilities which is unique to our species. It is key to communication, but also to our faculty of generating complex thoughts. We organise, conceptualise, and share ideas through language. Neuroscience has shed insightful lights on our understanding of how language is processed by the brain although the exact neural organisation, structural or functional, underpinning this processing remains poorly known. This project aims to employ new methodology to understand speech comprehension during naturalistic listening condition. One achievement of this thesis lies in bringing evidence towards putative predictive processing mechanisms for language comprehension and confront those with rule-based grammar processing. Namely, we looked on the one hand at cortical responses to information-theoretic measures that are relevant for predictive coding in the context of language processing and on the other hand to the response to syntactic tree structures. We successfully recorded responses to linguistic features from continuous EEG recordings during naturalistic speech listening. The use of ecologically valid stimuli allowed us to embed neural response in the context in which they naturally occur when hearing speech. This fostered the development of new analysis tools adapted for such experimental designs. Finally, we demonstrate the ability to decode comprehension from the EEG signals of participants with above-chance accuracy. This could be used as a better indicator of the severity and specificity of language disorders, and also to assess if a patient in a vegetative state understands speech without the need for any behavioural response. Hence a primary outcome is our contribution to the neurobiology of language comprehension. Furthermore, our results pave the way to the development of a new range of diagnostic tools to measure speech comprehension of patients with language impairment.Open Acces

Spiral - Imperial College Digital Repository

Psychophysiological indices of recognition memory

Author: Heaver Becky
Publication venue
Publication date: 11/06/2012
Field of study

It has recently been found that during recognition memory tests participants’ pupils dilate more when they view old items compared to novel items. This thesis sought to replicate this novel ‘‘Pupil Old/New Effect’’ (PONE) and to determine its relationship to implicit and explicit mnemonic processes, the veracity of participants’ responses, and the analogous Event-Related Potential (ERP) old/new effect. Across 9 experiments, pupil-size was measured with a video-based eye-tracker during a variety of recognition tasks, and, in the case of Experiment 8, with concurrent Electroencephalography (EEG). The main findings of this thesis are that: - the PONE occurs in a standard explicit test of recognition memory but not in “implicit” tests of either perceptual fluency or artificial grammar learning; - the PONE is present even when participants are asked to give false behavioural answers in a malingering task, or are asked not to respond at all; - the PONE is present when attention is divided both at learning and during recognition; - the PONE is accompanied by a posterior ERP old/new effect; - the PONE does not occur when participants are asked to read previously encountered words without making a recognition decision; - the PONE does not occur if participants preload an “old/new” response; - the PONE is not enhanced by repetition during learning. These findings are discussed in the context of current models of recognition memory and other psychophysiological indices of mnemonic processes. It is argued that together these findings suggest that the increase in pupil-size which occurs when participants encounter previously studied items is not under conscious control and may reflect primarily recollective processes associated with recognition memory

Sussex Research Online

Reversible stochastic attribute-value grammars

Author: de Kok Daniël Jakob Alex
Publication venue: [s.n.]
Publication date: 01/01/2013
Field of study

ARTS repository - University of Groningen

Italian and German students' use of the verb get: a learner corpus analysis

Author: MALOSSO JESSICA
Publication venue
Publication date: 14/12/2023
Field of study

openL'elaborato prevede un'iniziale descrizione dei corpora in generale e della loro utilità nell'ambito educativo per professori e studenti, successivamente un approfondimento sui learner corpora nello specifico, con un'attenzione particolare a studi relativi a studenti italiani e tedeschi, e infine una ricerca sul verbo inglese "get" e su come viene utilizzato dalle due categorie di alunni universitari

Padua Thesis and Dissertation Archive

Reversible stochastic attribute-value grammars

Author: de Kok Daniël Jakob Alex
Publication venue: [s.n.]
Publication date: 01/01/2013
Field of study

Een bekende vraag in de taalkunde is de vraag of de mens twee onafhankelijke modules heeft voor taalbegrip en taalproductie. In de computertaalkunde zijn taalbegrip (ontleding) en taalproductie (generatie) in de recente geschiedenis eigenlijk altijd als twee afzonderlijke taken en dus modules behandeld. De hoofdstelling van dit proefschrift is dat ontleding en generatie op een computer door één component uitgevoerd kan worden, zonder slechter te presteren dan afzonderlijke componenten voor ontleding en generatie. De onderliggende redenering is dat veel voorkeuren gedeeld moeten zijn tussen productie en begrip, omdat het anders niet mogelijk zou zijn om een geproduceerde zin te begrijpen. Om deze stelling te onderbouwen is er eerst een generator voor het Nederlands ontwikkeld. Deze generator is vervolgens geïntegreerd met een bestaande ontleder voor het Nederlands. Het proefschrift toont aan dat er inderdaad geen significant verschil is tussen de prestaties van de geïntegreerde module en afzonderlijke begrips- en productiecomponenten. Om een beter begrip te krijgen hoe het gecombineerde model werkt, wordt er zogenaamde `feature selectie’ toegepast. Dit is een techniek om de belangrijkste eigenschappen die een begrijpelijke en vloeiende zin karakteriseren op te sporen. Het proefschrift toont aan dat dit met een klein aantal, voornamelijk taalkundig geïnformeerde eigenschappen bepaald kan worden

ARTS repository - University of Groningen