463 research outputs found

    Grammatical Error Correction: A Survey of the State of the Art

    Full text link
    Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject-verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarise the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgements, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments

    From narrative descriptions to MedDRA: automagically encoding adverse drug reactions

    Get PDF
    The collection of narrative spontaneous reports is an irreplaceable source for the prompt detection of suspected adverse drug reactions (ADRs). In such task qualified domain experts manually revise a huge amount of narrative descriptions and then encode texts according to MedDRA standard terminology. The manual annotation of narrative documents with medical terminology is a subtle and expensive task, since the number of reports is growing up day-by-day. Natural Language Processing (NLP) applications can support the work of people responsible for pharmacovigilance. Our objective is to develop NLP algorithms and tools for the detection of ADR clinical terminology. Efficient applications can concretely improve the quality of the experts\u2019 revisions. NLP software can quickly analyze narrative texts and offer an encoding (i.e., a list of MedDRA terms) that the expert has to revise and validate. MagiCoder, an NLP algorithm, is proposed for the automatic encoding of free-text descriptions into MedDRA terms. MagiCoder procedure is efficient in terms of computational complexity. We tested MagiCoder through several experiments. In the first one, we tested it on a large dataset of about 4500 manually revised reports, by performing an automated comparison between human and MagiCoder encoding. Moreover, we tested MagiCoder on a set of about 1800 reports, manually revised ex novo by some experts of the domain, who also compared automatic solutions with the gold reference standard. We also provide two initial experiments with reports written in English, giving a first evidence of the robustness of MagiCoder w.r.t. the change of the language. For the current base version of MagiCoder, we measured an average recall and precision of and , respectively. From a practical point of view, MagiCoder reduces the time required for encoding ADR reports. Pharmacologists have only to review and validate the MedDRA terms proposed by the application, instead of choosing the right terms among the 70\u202fK low level terms of MedDRA. Such improvement in the efficiency of pharmacologists\u2019 work has a relevant impact also on the quality of the subsequent data analysis. We developed MagiCoder for the Italian pharmacovigilance language. However, our proposal is based on a general approach, not depending on the considered language nor the term dictionary

    Decoding speech comprehension from continuous EEG recordings

    Get PDF
    Human language is a remarkable manifestation of our cognitive abilities which is unique to our species. It is key to communication, but also to our faculty of generating complex thoughts. We organise, conceptualise, and share ideas through language. Neuroscience has shed insightful lights on our understanding of how language is processed by the brain although the exact neural organisation, structural or functional, underpinning this processing remains poorly known. This project aims to employ new methodology to understand speech comprehension during naturalistic listening condition. One achievement of this thesis lies in bringing evidence towards putative predictive processing mechanisms for language comprehension and confront those with rule-based grammar processing. Namely, we looked on the one hand at cortical responses to information-theoretic measures that are relevant for predictive coding in the context of language processing and on the other hand to the response to syntactic tree structures. We successfully recorded responses to linguistic features from continuous EEG recordings during naturalistic speech listening. The use of ecologically valid stimuli allowed us to embed neural response in the context in which they naturally occur when hearing speech. This fostered the development of new analysis tools adapted for such experimental designs. Finally, we demonstrate the ability to decode comprehension from the EEG signals of participants with above-chance accuracy. This could be used as a better indicator of the severity and specificity of language disorders, and also to assess if a patient in a vegetative state understands speech without the need for any behavioural response. Hence a primary outcome is our contribution to the neurobiology of language comprehension. Furthermore, our results pave the way to the development of a new range of diagnostic tools to measure speech comprehension of patients with language impairment.Open Acces

    Psychophysiological indices of recognition memory

    Get PDF
    It has recently been found that during recognition memory tests participants’ pupils dilate more when they view old items compared to novel items. This thesis sought to replicate this novel ‘‘Pupil Old/New Effect’’ (PONE) and to determine its relationship to implicit and explicit mnemonic processes, the veracity of participants’ responses, and the analogous Event-Related Potential (ERP) old/new effect. Across 9 experiments, pupil-size was measured with a video-based eye-tracker during a variety of recognition tasks, and, in the case of Experiment 8, with concurrent Electroencephalography (EEG). The main findings of this thesis are that: - the PONE occurs in a standard explicit test of recognition memory but not in “implicit” tests of either perceptual fluency or artificial grammar learning; - the PONE is present even when participants are asked to give false behavioural answers in a malingering task, or are asked not to respond at all; - the PONE is present when attention is divided both at learning and during recognition; - the PONE is accompanied by a posterior ERP old/new effect; - the PONE does not occur when participants are asked to read previously encountered words without making a recognition decision; - the PONE does not occur if participants preload an “old/new” response; - the PONE is not enhanced by repetition during learning. These findings are discussed in the context of current models of recognition memory and other psychophysiological indices of mnemonic processes. It is argued that together these findings suggest that the increase in pupil-size which occurs when participants encounter previously studied items is not under conscious control and may reflect primarily recollective processes associated with recognition memory

    Italian and German students' use of the verb get: a learner corpus analysis

    Get PDF
    openL'elaborato prevede un'iniziale descrizione dei corpora in generale e della loro utilità nell'ambito educativo per professori e studenti, successivamente un approfondimento sui learner corpora nello specifico, con un'attenzione particolare a studi relativi a studenti italiani e tedeschi, e infine una ricerca sul verbo inglese "get" e su come viene utilizzato dalle due categorie di alunni universitari

    Reversible stochastic attribute-value grammars

    Get PDF
    Een bekende vraag in de taalkunde is de vraag of de mens twee onafhankelijke modules heeft voor taalbegrip en taalproductie. In de computertaalkunde zijn taalbegrip (ontleding) en taalproductie (generatie) in de recente geschiedenis eigenlijk altijd als twee afzonderlijke taken en dus modules behandeld. De hoofdstelling van dit proefschrift is dat ontleding en generatie op een computer door Ă©Ă©n component uitgevoerd kan worden, zonder slechter te presteren dan afzonderlijke componenten voor ontleding en generatie. De onderliggende redenering is dat veel voorkeuren gedeeld moeten zijn tussen productie en begrip, omdat het anders niet mogelijk zou zijn om een geproduceerde zin te begrijpen. Om deze stelling te onderbouwen is er eerst een generator voor het Nederlands ontwikkeld. Deze generator is vervolgens geĂŻntegreerd met een bestaande ontleder voor het Nederlands. Het proefschrift toont aan dat er inderdaad geen significant verschil is tussen de prestaties van de geĂŻntegreerde module en afzonderlijke begrips- en productiecomponenten. Om een beter begrip te krijgen hoe het gecombineerde model werkt, wordt er zogenaamde `feature selectie’ toegepast. Dit is een techniek om de belangrijkste eigenschappen die een begrijpelijke en vloeiende zin karakteriseren op te sporen. Het proefschrift toont aan dat dit met een klein aantal, voornamelijk taalkundig geĂŻnformeerde eigenschappen bepaald kan worden
    • 

    corecore