20 research outputs found

    Examining inter-sentential influences on predicted verb subcategorization

    Get PDF
    This study investigated the influences of prior discourse context and cumulative syntactic priming on readers' predictions for verb subcategorizations. An additional aim was to determine whether cumulative syntactic priming has the same degree of influence following coherent discourse contexts as when following series of unrelated sentences. Participants (N = 40) read sentences using a self-paced, sentence-by-sentence procedure. Half of these sentences comprised a coherent discourse context intended to increase the expectation for a sentential complement (S) completion. The other half consisted of scrambled sentences. The trials in both conditions varied according to the proportion of verbs that resolved to an S (either 6S or 2S). Following each condition, participants read temporarily ambiguous sentences that resolved to an S. Reading times across the disambiguating and postdisambiguating regions were measured. No significant main effects or interactions were found for either region. However, the lack of significant findings for these analyses may have been due to low power. In a follow-up analysis, data from each gender were analyzed separately. For the data contributed by males, there were no significant findings. For the data contributed by females, the effect of coherence was significant (by participants but not by items) across the postdisambiguating region, and there was a marginally significant interaction (p =.05) between coherence and frequency across this region suggesting that discourse-level information may differentially influence the local sentence processing of female and male participant

    Individual Differences and Instructed Second Language Acquisition: Insights from Intelligent Computer Assisted Language Learning

    Get PDF
    The present dissertation focuses on the role of cognitive individual difference factors in the acquisition of second language vocabulary in the context of intelligent computer assisted language learning (ICALL). The aim was to examine the association between working memory and declarative memory and the learning of English phrasal verbs in a web-based ICALL-mediated experiment. Following a pretest-posttest design, 127 adult learners of English were assigned to two instructional conditions, namely meaning-focused and form-focused conditions. Learners in both conditions read news texts on the web for about two weeks; learners in the form-focused condition additionally interacted with the texts via selecting multiple-choice options. The results showed that both working memory and declarative memory were predictive of vocabulary acquisition. However, only the working memory effect was modulated by the instructional context, with the effect being found exclusively in the form-focused condition, and thus suggesting the presence of an aptitude-treatment interaction. Finally, findings also revealed that learning during treatment in the form-focused group was nonlinear, and that paying attention to form and meaning simultaneously impeded global reading comprehension for intermediate, not advanced learners. From a theoretical perspective, the findings provide evidence to suggest that individual differences in both working memory and declarative memory affect the acquisition of lexical knowledge in ICALL-supported contexts. Methodologically, the current study illustrates the advantages of conducting interdisciplinary work between ICALL and second language acquisition by allowing for the collection of experimental data through a web-based, all-encompassing ICALL system. Overall, the present dissertation represents an initial attempt at characterizing who is likely to benefit from ICALL-based interventions

    Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

    Get PDF
    Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging

    Use and Evaluation of Controlled Languages in Industrial Environments and Feasibility Study for the Implementation of Machine Translation

    Get PDF
    El presente trabajo de investigación se enmarca en los estudios de doctorado en traducción y la sociedad del conocimiento de la Universidad de Valencia y, en concreto, en la línea de investigación en tecnologías de la traducción, terminología y localización. En este sentido, esta disertación surge por la necesidad de establecer una metodología de investigación y ofrecer resultados empíricos sobre el desarrollo, implementación y evaluación de lenguajes controlados en la documentación técnica y su efecto tanto en los textos originales como en las traducciones de estos documentos. Así pues, el objetivo ha sido desarrollar una metodología para evaluar el impacto de los lenguajes controlados en la producción de documentación técnica dentro de contextos industriales y, más en concreto, en la elaboración de documentación técnica para el vehículo. El impacto se ha concretado en la mejora de la traducibilidad automática, un concepto que hemos discutido ampliamente en el capítulo 4, así como de la calidad de los textos meta.This research is part of the doctoral studies program "La traducción y la sociedad del conocimiento" at the University of Valencia. In particular the area of ​​research is translation technology, terminology and localisation. In this sense, this dissertation arises from the need to establish a research methodology and to provide empirical results on the development, implementation and evaluation of controlled languages ​​in the technical documentation and its effect on both original texts and the translations of these documents. Thus, the aim has been to develop a methodology to assess the impact of controlled languages ​​in the production of technical documentation in industrial contexts, and more specifically in the technical documentation for the vehicle. The impact has resulted in improved automatic translatability, a concept we have discussed at length in Chapter 4, as well as in the quality of the target texts

    Digital Classical Philology

    Get PDF
    The buzzwords “Information Society” and “Age of Access” suggest that information is now universally accessible without any form of hindrance. Indeed, the German constitution calls for all citizens to have open access to information. Yet in reality, there are multifarious hurdles to information access – whether physical, economic, intellectual, linguistic, political, or technical. Thus, while new methods and practices for making information accessible arise on a daily basis, we are nevertheless confronted by limitations to information access in various domains. This new book series assembles academics and professionals in various fields in order to illuminate the various dimensions of information's inaccessability. While the series discusses principles and techniques for transcending the hurdles to information access, it also addresses necessary boundaries to accessability.This book describes the state of the art of digital philology with a focus on ancient Greek and Latin. It addresses problems such as accessibility of information about Greek and Latin sources, data entry, collection and analysis of Classical texts and describes the fundamental role of libraries in building digital catalogs and developing machine-readable citation systems

    Iterated learning framework for unsupervised part-of-speech induction

    Get PDF
    Computational approaches to linguistic analysis have been used for more than half a century. The main tools come from the field of Natural Language Processing (NLP) and are based on rule-based or corpora-based (supervised) methods. Despite the undeniable success of supervised learning methods in NLP, they have two main drawbacks: on the practical side, it is expensive to produce the manual annotation (or the rules) required and it is not easy to find annotators for less common languages. A theoretical disadvantage is that the computational analysis produced is tied to a specific theory or annotation scheme. Unsupervised methods offer the possibility to expand our analyses into more resourcepoor languages, and to move beyond the conventional linguistic theories. They are a way of observing patterns and regularities emerging directly from the data and can provide new linguistic insights. In this thesis I explore unsupervised methods for inducing parts of speech across languages. I discuss the challenges in evaluation of unsupervised learning and at the same time, by looking at the historical evolution of part-of-speech systems, I make the case that the compartmentalised, traditional pipeline approach of NLP is not ideal for the task. I present a generative Bayesian system that makes it easy to incorporate multiple diverse features, spanning different levels of linguistic structure, like morphology, lexical distribution, syntactic dependencies and word alignment information that allow for the examination of cross-linguistic patterns. I test the system using features provided by unsupervised systems in a pipeline mode (where the output of one system is the input to another) and show that the performance of the baseline (distributional) model increases significantly, reaching and in some cases surpassing the performance of state-of-the-art part-of-speech induction systems. I then turn to the unsupervised systems that provided these sources of information (morphology, dependencies, word alignment) and examine the way that part-of-speech information influences their inference. Having established a bi-directional relationship between each system and my part-of-speech inducer, I describe an iterated learning method, where each component system is trained using the output of the other system in each iteration. The iterated learning method improves the performance of both component systems in each task. Finally, using this iterated learning framework, and by using parts of speech as the central component, I produce chains of linguistic structure induction that combine all the component systems to offer a more holistic view of NLP. To show the potential of this multi-level system, I demonstrate its use ‘in the wild’. I describe the creation of a vastly multilingual parallel corpus based on 100 translations of the Bible in a diverse set of languages. Using the multi-level induction system, I induce cross-lingual clusters, and provide some qualitative results of my approach. I show that it is possible to discover similarities between languages that correspond to ‘hidden’ morphological, syntactic or semantic elements

    Inquiries into the lexicon-syntax relations in Basque

    Get PDF
    Index:- Foreword. B. Oyharçabal.- Morphosyntactic disambiguation and shallow parsing in computational processing in Basque. I. Aduriz, A. Díaz de Ilarraza.- The transitivity of borrowed verbs in Basque: an outline. X. Alberdi.- Patrixa: a unification-based parser for Basque and its application to the automatic analysis of verbs. I. Aldezabal, M. J. Aranzabe, A. Atutxa, K.Gojenola, K, Sarasola.- Learning argument/adjunct distinction for Basque. I. Aldezabal, M. J. Aranzabe, K. Gojenola, K, Sarasola, A. Atutxa.- Analyzing verbal subcategorization aimed at its computation application. I. Aldezabal, P. Goenaga.- Automatic extraction of verb paterns from “hauta-lanerako euskal hiztegia”. J. M. Arriola, X. Artola, A. Soroa.- The case of an enlightening, provoking an admirable Basque derivational siffux with implications for the theory of argument structure. X. Artiagoitia.- Verb-deriving processes in Basque. J. C. Odriozola.- Lexical causatives and causative alternation in Basque. B. Oyharçabal.- Causation and semantic control; diagnosis of incorrect use in minorized languages. I. Zabala.- Subject index.- Contributions

    Developing a unified feature-based model for L2 lexical and syntactic processing

    Get PDF
    Research on lexical processing shows that lexical representations of L2 speakers are less developed, so frequency and vocabulary size affect the way they use lexical information. Specifically, reduced access to lexical features hinders the processing system of L2 speakers from working efficiently, having an impact on their ability to build syntactic structures in a native-like manner. The present research project aims to construct and test a unified model that explains how lexical and sentence processing interact. First, it develops and validates a productive vocabulary task for L2 Italian to measure vocabulary size. The task, called I-Lex, is based on the existing LEX30 for English, and uses frequency to determine lexical knowledge. Then, adopting the formalism of Head-Driven Phrase Structure Grammar, a framework that associates all the information relevant to the grammar with the lexicon, the research project develops a model that explains the effects of lexical access on syntactic processing. The model is tested in two empirical studies on L2 speakers of Italian. The first study, using an Oral Elicited Imitation task, and the I-Lex productive vocabulary task investigates the effects of frequency and vocabulary size on cleft sentences. The second study, using the same productive vocabulary task and a Self-paced Reading task, investigates frequency and vocabulary effects on relative clauses. The results reveal that frequency and vocabulary size interact with the ability of L2 speakers to process both cleft and relative clauses, providing evidence that accessing lexical features is a crucial stage for processing syntactic structures. Based on the results, a feature-based lexical network model is constructed. The model describes how lexical access and the activation of structural links between words can be described using the same set of lexical features. In the last chapter, the model is applied to the results of the two studies

    On how "the motion of the stars" changed the language of science : a corpus-based study of deverbal nominalizations in astronomy texts from 1700 to 1900

    Get PDF
    [Resumen] Esta tesis doctoral supone un análisis sobre las nominalizaciones deverbales formadas por sufijación en textos de astronomía escritos en inglés en los siglos XVIII y XIX. El material de análisis para este estudio fue tomado del Corpus of English Texts on Astronomy (CETA) (Moskowich et al., 2012). El corpus contiene dos textos por década y cada una de las muestras contiene alrededor de 10.000 palabras, lo que hace un total de 400.000 palabras analizables. El objetivo principal de este trabajo es el estudio de las nominalizaciones como marcadores del discurso científico en inglés moderno tardío. Varios cambios sociales que tuvieron lugar a principios de la Europa moderna afectaron gravemente el enfoque científico y esto tuvo un efecto directo en su lenguaje. Para llevar a cabo el análisis, he creado una tipología de las nominalizaciones que tiene en cuenta características formales y funcionales. Se formularon una serie de variables independientes: por un lado, las variables extralingüísticas abarcaron la cronología, el sexo del autor, el lugar de educación del autor y el tipo de texto; el resto de variables lingüísticas abordaron la estructura de las nominalizaciones y sus frases nominales e incluyeron el uso de sufijos, la etimología, los modificadores, las construcciones posesivas, la inclusión de agentes y circunstancias y la función sintáctica. Estas variables se aplicaron primero al número total de nominalizaciones encontradas en el corpus (8.446) y luego a las cuatro tipologías creadas.[Resumo]Esta tese de doutoramento é unha análise sobre as nominalizacións deverbais formadas por sufixación en textos de astronomía escritos en inglés nos séculos XVIII e XIX. O material corpus para este estudo foi tirado do Corpus of English Texts on Astronomy (CETA) (Moskowich et al., 2012). O corpus contén dous textos por década e cada mostra contén aproximadamente 10.000 palabras, o que fai un total de 400.000 palabras analizábles. O principal obxectivo deste traballo é estudar as nominalizacións como marcadores do discurso científico en inglés moderno tardío. Varios cambios sociais que se produciron en Europa ao comezo da etapa moderna afectaron severamente os enfoques de cara a ciencia e iso tivo un efecto directo sobre a súa linguaxe. Para realizar a análise creei unha tipoloxía de nominalizacións tendo en conta características formais e funcionais. Tamén formulei unha serie de variables independientes: por unha banda, as variables extralinguísticas inclúen cronoloxía, sexo do autor, lugar de educación do autor e tipo de texto; por outra, unha serie de variables intralingüísticas cubren aspectos relacionados coa estrutura das nominalizacións e as súas frases nominais. Estas incluen o uso de sufixos, a etimoloxía, os modificadores empregados, as construcións posessivas, a inclusión de axentes e circunstancias e a función sintáctica. Estas variables foron primeiramente aplicadas ao número total de nominalizacións atopadas no corpus (8.446) e, a continuación, as catro tipoloxías creadas para este estudo.[Abstract] This doctoral thesis analyzes deverbal nominalizations formed through suffixation in astronomy texts written in English in the eighteenth and nineteenth centuries. The corpus material for this study was taken from the The Corpus of English Texts on Astronomy (CETA) (Moskowich et al., 2012). The corpus contains two texts per decade and each sample text contains around 10,000 words, which makes a total of 400,000 analyzable words. The main aim of this work is to study nominalizations as scientific discourse markers in late Modern English. Several social changes that took place in early Modern Europe affected severely approaches to science and this had a direct effect on its language. To carry out the analysis a typology of nominalizations acknowledging formal and functional features was created and independent variables were formulated: on the one hand, extralinguistic variables included chronology, sex of author, place of education and text-type; on the other hand, intralinguistic variables dealt with the structure of nominalizations and their NPs and included suffix use, etymology, modifiers, possessive constructions, agency and circumstance inclusion and syntactical function. These variables were first applied to the total number of nominalizations found in the corpus (8,446) and then to the four typologies created
    corecore