30 research outputs found
Generating Grammar Exercises
International audienceGrammar exercises for language learning fall into two distinct classes: those that are based on ''real life sentences'' extracted from existing documents or from the web; and those that seek to facilitate language acquisition by presenting the learner with exercises whose syntax is as simple as possible and whose vocabulary is restricted to that contained in the textbook being used. In this paper, we introduce a framework (called gramex) which permits generating the second type of grammar exercises. Using generation techniques, we show that a grammar can be used to semi-automatically generate grammar exercises which target a specific learning goal; are made of short, simple sentences; and whose vocabulary is restricted to that used in a given textbook
Lexical simplification for the systematic support of cognitive accessibility guidelines
The Internet has come a long way in recent years, contributing to the proliferation of
large volumes of digitally available information. Through user interfaces we can access
these contents, however, they are not accessible to everyone. The main users affected are
people with disabilities, who are already a considerable number, but accessibility barriers
affect a wide range of user groups and contexts of use in accessing digital information.
Some of these barriers are caused by language inaccessibility when texts contain long
sentences, unusual words and complex linguistic structures. These accessibility barriers
directly affect people with cognitive disabilities.
For the purpose of making textual content more accessible, there are initiatives such
as the Easy Reading guidelines, the Plain Language guidelines and some of the languagespecific
Web Content Accessibility Guidelines (WCAG). These guidelines provide documentation,
but do not specify methods for meeting the requirements implicit in these
guidelines in a systematic way. To obtain a solution, methods from the Natural Language
Processing (NLP) discipline can provide support for achieving compliance with the cognitive
accessibility guidelines for the language.
The task of text simplification aims at reducing the linguistic complexity of a text from
a syntactic and lexical perspective, the latter being the main focus of this Thesis. In this
sense, one solution space is to identify in a text which words are complex or uncommon,
and in the case that there were, to provide a more usual and simpler synonym, together
with a simple definition, all oriented to people with cognitive disabilities.
With this goal in mind, this Thesis presents the study, analysis, design and development
of an architecture, NLP methods, resources and tools for the lexical simplification of
texts for the Spanish language in a generic domain in the field of cognitive accessibility.
To achieve this, each of the steps present in the lexical simplification processes is studied,
together with methods for word sense disambiguation. As a contribution, different
types of word embedding are explored and created, supported by traditional and dynamic
embedding methods, such as transfer learning methods. In addition, since most of the
NLP methods require data for their operation, a resource in the framework of cognitive
accessibility is presented as a contribution.Internet ha avanzado mucho en los últimos años contribuyendo a la proliferación de
grandes volúmenes de información disponible digitalmente. A través de interfaces de
usuario podemos acceder a estos contenidos, sin embargo, estos no son accesibles a todas
las personas. Los usuarios afectados principalmente son las personas con discapacidad
siendo ya un número considerable, pero las barreras de accesibilidad afectan a un gran
rango de grupos de usuarios y contextos de uso en el acceso a la información digital. Algunas
de estas barreras son causadas por la inaccesibilidad al lenguaje cuando los textos
contienen oraciones largas, palabras inusuales y estructuras lingüísticas complejas. Estas
barreras de accesibilidad afectan directamente a las personas con discapacidad cognitiva.
Con el fin de hacer el contenido textual más accesible, existen iniciativas como las
pautas de Lectura Fácil, las pautas de Lenguaje Claro y algunas de las pautas de Accesibilidad
al Contenido en la Web (WCAG) específicas para el lenguaje. Estas pautas
proporcionan documentación, pero no especifican métodos para cumplir con los requisitos
implícitos en estas pautas de manera sistemática. Para obtener una solución, los
métodos de la disciplina del Procesamiento del Lenguaje Natural (PLN) pueden dar un
soporte para alcanzar la conformidad con las pautas de accesibilidad cognitiva relativas al
lenguaje
La tarea de la simplificación de textos del PLN tiene como objetivo reducir la complejidad
lingüística de un texto desde una perspectiva sintáctica y léxica, siendo esta última
el enfoque principal de esta Tesis. En este sentido, un espacio de solución es identificar
en un texto qué palabras son complejas o poco comunes, y en el caso de que sí hubiera,
proporcionar un sinónimo más usual y sencillo, junto con una definición sencilla, todo
ello orientado a las personas con discapacidad cognitiva.
Con tal meta, en esta Tesis, se presenta el estudio, análisis, diseño y desarrollo de
una arquitectura, métodos PLN, recursos y herramientas para la simplificación léxica de
textos para el idioma español en un dominio genérico en el ámbito de la accesibilidad
cognitiva. Para lograr esto, se estudia cada uno de los pasos presentes en los procesos
de simplificación léxica, junto con métodos para la desambiguación del sentido de las
palabras. Como contribución, diferentes tipos de word embedding son explorados y creados,
apoyados por métodos embedding tradicionales y dinámicos, como son los métodos
de transfer learning. Además, debido a que gran parte de los métodos PLN requieren
datos para su funcionamiento, se presenta como contribución un recurso en el marco de
la accesibilidad cognitiva.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Antonio Macías Iglesias.- Secretario: Israel González Carrasco.- Vocal: Raquel Hervás Ballestero
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Peer reviewe
Recommended from our members
The Roles of Language Models and Hierarchical Models in Neural Sequence-to-Sequence Prediction
With the advent of deep learning, research in many areas of machine learning is converging towards the same set of methods and models. For example, long short-term memory networks are not only popular for various tasks in natural language processing (NLP) such as speech recognition, machine translation, handwriting recognition, syntactic parsing, etc., but they are also applicable to seemingly unrelated fields such as robot control, time series prediction, and bioinformatics. Recent advances in contextual word embeddings like BERT boast with achieving state-of-the-art results on 11 NLP tasks with the same model. Before deep learning, a speech recognizer and a syntactic parser used to have little in common as systems were much more tailored towards the task at hand.
At the core of this development is the tendency to view each task as yet another data mapping problem, neglecting the particular characteristics and (soft) requirements tasks often have in practice. This often goes along with a sharp break of deep learning methods with previous research in the specific area. This work can be understood as an antithesis to this paradigm. We show how traditional symbolic statistical machine translation models can still improve neural machine translation (NMT) while reducing the risk for common pathologies of NMT such as hallucinations and neologisms. Other external symbolic models such as spell checkers and morphology databases help neural grammatical error correction. We also focus on language models that often do not play a role in vanilla end-to-end approaches and apply them in different ways to word reordering, grammatical error correction, low-resource NMT, and document-level NMT. Finally, we demonstrate the benefit of hierarchical models in sequence-to-sequence prediction. Hand-engineered covering grammars are effective in preventing catastrophic errors in neural text normalization systems. Our operation sequence model for interpretable NMT represents translation as a series of actions that modify the translation state, and can also be seen as derivation in a formal grammar.EPSRC grant EP/L027623/1
EPSRC Tier-2 capital grant EP/P020259/