463 research outputs found
Grammatical Error Correction: A Survey of the State of the Art
Grammatical Error Correction (GEC) is the task of automatically detecting and
correcting errors in text. The task not only includes the correction of
grammatical errors, such as missing prepositions and mismatched subject-verb
agreement, but also orthographic and semantic errors, such as misspellings and
word choice errors respectively. The field has seen significant progress in the
last decade, motivated in part by a series of five shared tasks, which drove
the development of rule-based methods, statistical classifiers, statistical
machine translation, and finally neural machine translation systems which
represent the current dominant state of the art. In this survey paper, we
condense the field into a single article and first outline some of the
linguistic challenges of the task, introduce the most popular datasets that are
available to researchers (for both English and other languages), and summarise
the various methods and techniques that have been developed with a particular
focus on artificial error generation. We next describe the many different
approaches to evaluation as well as concerns surrounding metric reliability,
especially in relation to subjective human judgements, before concluding with
an overview of recent progress and suggestions for future work and remaining
challenges. We hope that this survey will serve as comprehensive resource for
researchers who are new to the field or who want to be kept apprised of recent
developments
From narrative descriptions to MedDRA: automagically encoding adverse drug reactions
The collection of narrative spontaneous reports is an irreplaceable source for the prompt detection of suspected adverse drug reactions (ADRs). In such task qualified domain experts manually revise a huge amount of narrative descriptions and then encode texts according to MedDRA standard terminology. The manual annotation of narrative documents with medical terminology is a subtle and expensive task, since the number of reports is growing up day-by-day. Natural Language Processing (NLP) applications can support the work of people responsible for pharmacovigilance. Our objective is to develop NLP algorithms and tools for the detection of ADR clinical terminology. Efficient applications can concretely improve the quality of the experts\u2019 revisions. NLP software can quickly analyze narrative texts and offer an encoding (i.e., a list of MedDRA terms) that the expert has to revise and validate. MagiCoder, an NLP algorithm, is proposed for the automatic encoding of free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in terms of computational complexity. We tested MagiCoder through several experiments. In the first one, we tested it on a large dataset of about 4500 manually revised reports, by performing an automated comparison between human and MagiCoder encoding. Moreover, we tested MagiCoder on a set of about 1800 reports, manually revised ex novo by some experts of the domain, who also compared automatic solutions with the gold reference standard. We also provide two initial experiments with reports written in English, giving a first evidence of the robustness of MagiCoder w.r.t. the change of the language. For the current base version of MagiCoder, we measured an average recall and precision of and , respectively. From a practical point of view, MagiCoder reduces the time required for encoding ADR reports. Pharmacologists have only to review and validate the MedDRA terms proposed by the application, instead of choosing the right terms among the 70\u202fK low level terms of MedDRA. Such improvement in the efficiency of pharmacologists\u2019 work has a relevant impact also on the quality of the subsequent data analysis. We developed MagiCoder for the Italian pharmacovigilance language. However, our proposal is based on a general approach, not depending on the considered language nor the term dictionary
Decoding speech comprehension from continuous EEG recordings
Human language is a remarkable manifestation of our cognitive abilities which is unique to our species. It is key to communication, but also to our faculty of generating
complex thoughts. We organise, conceptualise, and share ideas through language. Neuroscience has shed insightful lights on our understanding of how language is processed
by the brain although the exact neural organisation, structural or functional, underpinning this processing remains poorly known. This project aims to employ new methodology to understand speech comprehension during naturalistic listening condition. One achievement of this thesis lies in bringing evidence towards putative predictive processing mechanisms for language comprehension and confront those with rule-based grammar processing. Namely, we looked on the one hand at cortical responses to information-theoretic measures that are relevant for predictive coding in the context of language processing and on the other hand to the response to syntactic tree structures. We successfully recorded responses to linguistic features from continuous EEG recordings during naturalistic speech listening. The use of ecologically valid stimuli allowed us to embed neural response in the context in which they naturally occur when hearing speech. This fostered the development of new analysis tools adapted for such experimental designs. Finally, we demonstrate the ability to decode comprehension from the EEG signals of participants with above-chance accuracy. This could be used as a better indicator of the severity and specificity of language disorders, and also to assess if a patient in a vegetative state understands speech without the need for any behavioural response. Hence a primary outcome is our contribution to the neurobiology of language comprehension. Furthermore, our results pave the way to the development of a new range of diagnostic tools to measure speech comprehension of patients with language impairment.Open Acces
Psychophysiological indices of recognition memory
It has recently been found that during recognition memory tests participantsâ pupils dilate more when they view old items compared to novel items. This thesis sought to replicate this novel ââPupil Old/New Effectââ (PONE) and to determine its relationship to implicit and explicit mnemonic processes, the veracity of participantsâ responses, and the analogous Event-Related Potential (ERP) old/new effect. Across 9 experiments, pupil-size was measured with a video-based eye-tracker during a variety of recognition tasks, and, in the case of Experiment 8, with concurrent Electroencephalography (EEG). The main findings of this thesis are that:
- the PONE occurs in a standard explicit test of recognition memory but not in âimplicitâ tests of either perceptual fluency or artificial grammar learning;
- the PONE is present even when participants are asked to give false behavioural answers in a malingering task, or are asked not to respond at all;
- the PONE is present when attention is divided both at learning and during recognition;
- the PONE is accompanied by a posterior ERP old/new effect;
- the PONE does not occur when participants are asked to read previously encountered words without making a recognition decision;
- the PONE does not occur if participants preload an âold/newâ response;
- the PONE is not enhanced by repetition during learning.
These findings are discussed in the context of current models of recognition memory and other psychophysiological indices of mnemonic processes. It is argued that together these findings suggest that the increase in pupil-size which occurs when participants encounter previously studied items is not under conscious control and may reflect primarily recollective processes associated with recognition memory
Italian and German students' use of the verb get: a learner corpus analysis
openL'elaborato prevede un'iniziale descrizione dei corpora in generale e della loro utilitaÌ nell'ambito educativo per professori e studenti, successivamente un approfondimento sui learner corpora nello specifico, con un'attenzione particolare a studi relativi a studenti italiani e tedeschi, e infine una ricerca sul verbo inglese "get" e su come viene utilizzato dalle due categorie di alunni universitari
Reversible stochastic attribute-value grammars
Een bekende vraag in de taalkunde is de vraag of de mens twee onafhankelijke modules heeft voor taalbegrip en taalproductie. In de computertaalkunde zijn taalbegrip (ontleding) en taalproductie (generatie) in de recente geschiedenis eigenlijk altijd als twee afzonderlijke taken en dus modules behandeld. De hoofdstelling van dit proefschrift is dat ontleding en generatie op een computer door Ă©Ă©n component uitgevoerd kan worden, zonder slechter te presteren dan afzonderlijke componenten voor ontleding en generatie. De onderliggende redenering is dat veel voorkeuren gedeeld moeten zijn tussen productie en begrip, omdat het anders niet mogelijk zou zijn om een geproduceerde zin te begrijpen. Om deze stelling te onderbouwen is er eerst een generator voor het Nederlands ontwikkeld. Deze generator is vervolgens geĂŻntegreerd met een bestaande ontleder voor het Nederlands. Het proefschrift toont aan dat er inderdaad geen significant verschil is tussen de prestaties van de geĂŻntegreerde module en afzonderlijke begrips- en productiecomponenten. Om een beter begrip te krijgen hoe het gecombineerde model werkt, wordt er zogenaamde `feature selectieâ toegepast. Dit is een techniek om de belangrijkste eigenschappen die een begrijpelijke en vloeiende zin karakteriseren op te sporen. Het proefschrift toont aan dat dit met een klein aantal, voornamelijk taalkundig geĂŻnformeerde eigenschappen bepaald kan worden
- âŠ