8 research outputs found
Towards a rule-based Spanish to Spanish sign language translation: from written forms to phonological representations
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Tecnología Electrónica y de las Comunicaciones. Fecha de lectura: noviembre de 2014This thesis addresses several aspects about the automatic translation from Castilian
Spanish to Spanish Sign Language (LSE), two typologically distant languages with not
enough linguistics resources enabling statistical approaches to translation. For this reason,
a rule-based approach grounded on contrastive grammatical studies on both languages is
used.
An architecture following the analysis, transfer and generation model has been chosen.
Transfer is performed at the grammatical function level, which is delivered by a Spanish
dependency parser without incurring into the complexities of a more deeper analysis.
The bilingual base lexicon is obtained from the Diccionario normativo de la lengua de
signos española (DILSE-III), which contains the correspondences between Spanish lemmas
and their SEA (Sistema de escritura alfabética) representation of signs. The lexicon is
extended in two different ways: taking advantage of the difference in flexibility between
the part-of-speech systems of Spanish and LSE and exploiting several lexical semantic
relations, such as synonymy, hyponymy and meronymy.
During the structural transfer phase, some nodes of the dependency analysis are transformed,
others are removed and new nodes are inserted. Some classifier predicates are
generated in this phase. Surface order generation of signs is obtained by means of the
topological ordering of the graph of precedence relations between signs. Pairs of signs
having head-dependent relations or sharing the same head are examined in order to determine
if its relative ordering is marked or not. The system is evaluated at this point and
results are compared to those obtained with statistical models. Best results are obtained
with the rule-based approach, with a 0.30 BLEU (Bilingual Evaluation Understudy) and
a 42% TER (Translation Error Rate). A linguistic-oriented analysis of errors is provided.
Finally, in the morphological generation phase, glosses with morphological annotations
are replaced by the HamNoSys (Hamburg Sign Language Notation System) phonological
representations produced by a computational morphology. These representations are used
for animation synthesis with avatars. The computational morphology that has been implemented
uses inflection, introflection and suppletion to model a significant fragment
of the LSE morphology. Among the phenomena considered, it has been implemented
deictics, nominal plural, aspect marking, verbal agreement, adjectival modification and
degree.Esta tesis aborda varios aspectos sobre traducción automática ed español a lengua de
signos española (LSE), dos lenguas tipológicamente distantes y con insuficientes recursos
lingüísticos que hagan posible aproximaciones estadísticas a la traducción. Por ese motivo,
se propone una estrategia basada en reglas lingüísticas fundamentadas en los estudios gramaticales
contrastivos existentes entre ambas lenguas.
Se ha optado por una arquitectura para la traducción siguiendo el modelo de análisis,
transferencia y generación, en la que la transferencia se realiza al nivel de las funciones
gramaticales proporcionadas por un analizador de dependencias, evitando así las complejidades
asociadas a un análisis lingüístico mas profundo para el español.
El lexicón bilíngüe base para la transferencia léxica se ha obtenido de las entradas
del Diccionario normativo de la lengua de signos española (DILSE-III), que contiene las
correspondencias entre lemas en español y la representación SEA (Sistema de escritura
alfabética) de los signos. Este lexicón se ha ampliado por dos vías: Aprovechando las
diferencias de flexibilidad entre las clase de palabras del español y la LSE, y explotando
relaciones semánticas como la sinonimia, la hiperonimia y la meronimia.
Durante la transferencia estructural, algunos nodos del árbol de análisis de dependencias
son transformados, otros son borrados y son insertados nuevos nodos. Algunos
predicados clasificadores son generados en esta fase. La generación del orden superficial
de los signos se obtiene mediante la ordenación topológica del grafo de relaciones de
precedencia entre signos. Los pares de signos en nodos que mantienen la relación núcleodependiente
o son dependientes de un mismo signo son examinados para determinar si
su orden relativo está marcado o no. El sistema de traducción es evaluado en este punto
utilizando un corpus y comparado con el resultado obtenido con distintos modelos de
traducción estadística. Sobre un corpus de control de glosas, el sistema basado en reglas
obtiene mejores resultados, con un BLEU (Bilingual Evaluation Understudy) del 0,30 y
un TER (Translation Error Rate) del 42%. Sobre los resultados se ha realizado un análisis
de los errores.
Finalmente, para la generación morfológica, las glosas junto con sus correspondientes
anotaciones morfológicas son reemplazadas por las representaciones fonológicas Ham-
NoSys producidas por una morfología computacional y usables para la síntesis de animaciones
mediante avatares. La morfología implementada usa flexión, introflexión y
supleción para modelar un fragmento bastante amplio de la LSE. Entre los fenómenos
tratados se incluyen la deixis, la realización de los distintos tipos de plural nominal, el
aspecto, la concordancia argumental del verbo, la modificación adjetival y el grado
Lexicography of coronavirus-related neologisms
This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units, where they come from, how they are transmitted (or differ) across languages, and how their use and meaning are reflected in dictionaries of all sorts. Recent trends in as many as ten languages are considered, including general and specialized language, monolingual as well as bilingual and printed as well as online dictionaries
Lexicography of Coronavirus-related Neologisms
This volume brings together contributions by international experts reflecting on Covid19-related neologisms and their lexicographic processing and representation. The papers analyze new words, new meanings of existing words, and new multiword units in as many as ten languages, considering both specialized and general language, monolingual as well as bilingual and printed as well as online dictionaries
Native and non-native processing of morphologically complex words in Italian
The present work focuses on the organization of the mental lexicon in native and non-native speakers and aims at investigating whether words are connected in the mind in terms of morphological criteria, i.e., through a network of associations establishing when a co-occurrence of form and meaning is found. Psycholinguistic research on native lexical access has demonstrated that morphology indeed underlies the organization of the mental lexicon, even though controversies about the locus of this level of organization remain. On the other hand, research in the field of second language acquisition has only recently turned to investigate such issues and its findings so far have been controversial. Specifically, the debate centers on whether native and non-native speakers share the same processing systems. According to recent proposals (Heyer & Clahsen 2015), this would not be the case and L2 processing would be more affected by formal rather than morphological criteria. In this light, the present work is aimed at verifying the impact of formal characteristics in native and non-native lexical access focusing on the processing of formally transparent versus non-transparent words in Italian. Two morphological phenomena are investigated by means of four psycholinguistic experiments involving a lexical decision task combined with the masked priming paradigm. Experiments 1 & 2 compare the processing of allomorphic vs non-allomorphic derivatives, to investigate whether formal alterations impair the appreciation of the relationship between two morphologically related words. Experiments 3 & 4 are focused on lack of base autonomy found in so-called bound stems, i.e., stems which cannot occur in isolation and are aimed at determining whether the processing of free and bound stems differs. The results of Experiments 1 and 2 indicate that allomorphic variation does not influence the associations established among related words in native speakers, in line with the predictions that can be formulated within usage-based perspectives on language. Non-native speakers, on the other hand, seem to be more pervasively affected by the phonological/orthographical properties of words, but not to the point that transparent morphological relations can be reduced to mere form overlap shared by morphological relatives. Likewise, stem autonomy was not found to affect the way words containing bound and free stems are processed by native speakers, at least under certain conditions, suggesting that boundedness is not an issue influencing the establishment of morphological relationships among words. Non-native speakers, however, were found to be sensitive to the isolability of the stem, in a way that suggests that free bases may be more salient morphological units for them, as opposed to bound stems, which are seemingly more closely associated with orthographic strings resembling each other. Taken together, the findings of the present work suggest a model of the native mental lexicon based on words and morphological schemas emerging from the relationships establishing among them, despite phonological variations and stem boundedness. While it is unclear whether such a system of connections and schemas is equally strong in the non-native lexicon, morphological relationships still appear to drive lexical organization. Crucially, however, such organization is modulated by form, as demonstrated by the effects of phonological variations and lack of base autonomy
The concept of 'Genetic Modification' in a Descriptive Translation Study (DTS) of an English-Spanish corpus of Popular Science Books on Genetic Engineering: Denominative Variation, Semantic Prosody and Ideological Aspects of Translation Strategies
El objetivo general consiste en examinar el concepto de 'modificación genética' a través de tres fenómenos
lingüísticos: la variación denominativa, la prosodia semántica y los aspectos ideológicos de las principales
estrategias de traducción. Para estudiar la variación denominativa se han seleccionado dos términos
técnicos 'DNA' y 'gene/s' y dos subtécnicos 'food/s' y 'crop/s'. Para el estudio de la prosodia semántica se han
analizado las concordancias de 'genetic' + N y 'genetically'`+ Adj. La comparación de las variantes
denominativas y las prosodias semánticas en un corpus paralelo inglés-español de ingenería genética arrojan
resultados sobre los aspectos ideológicos de las principales estrategias de traducción encontradas en el corpus.Departamento de Filología Ingles
The lexeme in descriptive and theoretical morphology
After being dominant during about a century since its invention by Baudouin de Courtenay at the end of the nineteenth century, morpheme is more and more replaced by lexeme in contemporary descriptive and theoretical morphology. The notion of a lexeme is usually associated with the work of P. H. Matthews (1972, 1974), who characterizes it as a lexical entity abstracting over individual inflected words. Over the last three decades, the lexeme has become a cornerstone of much work in both inflectional morphology and word formation (or, as it is increasingly been called, lexeme formation). The papers in the present volume take stock of the descriptive and theoretical usefulness of the lexeme, but also adress many of the challenges met by classical lexeme-based theories of morphology
Native and non-native processing of morphologically complex words in Italian
The present work focuses on the organization of the mental lexicon in native and non-native speakers and aims at investigating whether words are connected in the mind in terms of morphological criteria, i.e., through a network of associations establishing when a co-occurrence of form and meaning is found. Psycholinguistic research on native lexical access has demonstrated that morphology indeed underlies the organization of the mental lexicon, even though controversies about the locus of this level of organization remain. On the other hand, research in the field of second language acquisition has only recently turned to investigate such issues and its findings so far have been controversial. Specifically, the debate centers on whether native and non-native speakers share the same processing systems. According to recent proposals (Heyer & Clahsen 2015), this would not be the case and L2 processing would be more affected by formal rather than morphological criteria. In this light, the present work is aimed at verifying the impact of formal characteristics in native and non-native lexical access focusing on the processing of formally transparent versus non-transparent words in Italian. Two morphological phenomena are investigated by means of four psycholinguistic experiments involving a lexical decision task combined with the masked priming paradigm. Experiments 1 & 2 compare the processing of allomorphic vs non-allomorphic derivatives, to investigate whether formal alterations impair the appreciation of the relationship between two morphologically related words. Experiments 3 & 4 are focused on lack of base autonomy found in so-called bound stems, i.e., stems which cannot occur in isolation and are aimed at determining whether the processing of free and bound stems differs. The results of Experiments 1 and 2 indicate that allomorphic variation does not influence the associations established among related words in native speakers, in line with the predictions that can be formulated within usage-based perspectives on language. Non-native speakers, on the other hand, seem to be more pervasively affected by the phonological/orthographical properties of words, but not to the point that transparent morphological relations can be reduced to mere form overlap shared by morphological relatives. Likewise, stem autonomy was not found to affect the way words containing bound and free stems are processed by native speakers, at least under certain conditions, suggesting that boundedness is not an issue influencing the establishment of morphological relationships among words. Non-native speakers, however, were found to be sensitive to the isolability of the stem, in a way that suggests that free bases may be more salient morphological units for them, as opposed to bound stems, which are seemingly more closely associated with orthographic strings resembling each other. Taken together, the findings of the present work suggest a model of the native mental lexicon based on words and morphological schemas emerging from the relationships establishing among them, despite phonological variations and stem boundedness. While it is unclear whether such a system of connections and schemas is equally strong in the non-native lexicon, morphological relationships still appear to drive lexical organization. Crucially, however, such organization is modulated by form, as demonstrated by the effects of phonological variations and lack of base autonomy
The lexeme in descriptive and theoretical morphology
After being dominant during about a century since its invention by Baudouin de Courtenay at the end of the nineteenth century, morpheme is more and more replaced by lexeme in contemporary descriptive and theoretical morphology.
The notion of a lexeme is usually associated with the work of P. H. Matthews (1972, 1974), who characterizes it as a lexical entity abstracting over individual inflected words. Over the last three decades, the lexeme has become a cornerstone of much work in both inflectional morphology and word formation (or, as it is increasingly been called, lexeme formation). The papers in the present volume take stock of the descriptive and theoretical usefulness of the lexeme, but also adress many of the challenges met by classical lexeme-based theories of morphology