883 research outputs found

    Cross-lingual Coreference Resolution of Pronouns

    Get PDF
    This work is, to our knowledge, a first attempt at a machine learning approach to cross-lingual coreference resolution, i.e. coreference resolution (CR) performed on a bitext. Focusing on CR of English pronouns, we leverage language differences and enrich the feature set of a standard monolingual CR system for English with features extracted from the Czech side of the bitext. Our work also includes a supervised pronoun aligner that outperforms a GIZA++ baseline in terms of both intrinsic evaluation and evaluation on CR. The final cross-lingual CR system has successfully outperformed both a monolingual CR and a cross-lingual projection system

    Teaching English-Spanish contrastive analysis through translation

    Get PDF
    For the past few years, the Department of Modern Languages at the University of León (Spain) has been offering a course on English-Spanish Contrastive Linguistics focusing on explicit instruction in L1-L2 differences. The directionality is from English into Spanish, since our students are native speakers of Spanish enrolled in an English degree. This has proved to be helpful in raising learners' awareness of the one-to-many relationships between meaning and form across language boundaries and also in improving their translation performance. The course makes use of a corpus-based methodology to develop a number of tasks oriented towards highlighting the grammatical areas that are particularly problematic from an English-Spanish perspective and may cause difficulties in the improvement of the translation skills of our students. After a short theoretical introduction to the field of Contrastive Linguistics, a top-down functional approach is employed to deal with textual contrasts and contrasts at various lexico-grammatical levels. The main purposes of this ‘contrast-through-translation’ course are to offer students the opportunity a) to explore the semantic and pragmatic cross-linguistic relationships between English and Spanish at different levels of analysis, and b) to improve both their understanding of the source language (English) and their accuracy in the production of the target language (Spanish)

    Corpora for Computational Linguistics

    Get PDF
    Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed

    On L1 Attrition and Prosody in Pronominal Anaphora Resolution

    Get PDF
    This thesis is a collection of four studies on pronominal anaphora resolution with a focus on first language (L1) attrition and prosody. In Study I, we explored the temporariness of attrition effects on anaphora resolution in L1 Italian speakers who moved to Sweden after puberty (i.e., late bilinguals). An experimental group of 20 late Italian-Swedish bilinguals and a control group of 21 Italian monolinguals completed a self-paced interpretation task twice, and we measured response preferences and response times. In Study II, we investigated how L1 Italian and L1 Swedish speakers use pause features and prominence cues to resolve globally ambiguous anaphora sentences, and whether their patterns in the use of prosody mirror the divergent coreference patterns in the two languages. 28 L1 Italian speakers and 28 L1 Swedish speakers completed a speech production task, in which we analyzed the inter-clausal pause length and the pronoun’s degree of prosodic prominence, and a control interpretation task, in which we considered response preferences. Study III represents a continuation of Study II, since we examined a group of 18 late Italian-Swedish bilinguals, who completed the same experimental tasks of Study II. Study IV is a theoretical investigation, in which we discussed previous inconsistent findings on anaphora resolution in light of the interplay between hierarchical structure and linear order of a sentence. The results of the four studies suggest, first, that anaphora resolution may also affect null pronouns, and that task-learning effects should be taken into account for further research on L1 re-immersion. Second, they suggest that inter-clausal pause and prosodic prominence of pronouns are likely to break the canonical coreference pattern, both in a null subject language and in a non-null subject language. Third, the findings also reveal that L1 attrition affects prominence patterns and pause features in pronoun resolution. In particular, the longer the residence in the foreign language (FL) environment, the higher the probability that late bilinguals adapt to the FL patterns when they use prosody to resolve anaphora sentences. Fourth, both monolinguals and bilinguals are sensitive to the interplay between hierarchical structure and linear order of anaphora. However, they employ different strategies to interpret an anaphora sentence, in which hierarchical structure and linear order favor different antecedents. The implications of the findings are discussed in light of the role of processing and cross-linguistic influence (CLI) in L1 attrition, as well as in light of the use of prosodic cues to resolve an anaphoric reference, both in relation to the Null Subject Parameter and in relation to L1 attrition

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Limitations and possibilities of machine translation: a case study

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão. Programa de Pós-Graduação em Letras/Inglês e Literatura CorrespondenteEste trabalho apresenta resultados de um estudo de caso sobre a tradução do pronome inglês it para o português. Apresenta também um breve panorama geral do desenvolvimento da tradução de máquina desde seu início até a atualidade. Um corpus paralelo de aproximadamente quarenta e cinco mil palavras das línguas de partida e chegada foi coletado. Também foi utilizado um esquema de anotação especificamente desenvolvido para os propósitos deste estudo, a fim de classificar as 305 ocorrências do pronome it. Os elementos que compõem a anotação são: função sintática, tipo de antecedente e estratégia de processamento, os quais são discutidos nesta dissertação. Os resultados são comparados a traduções de sistemas comerciais de tradução de máquina, tendo como parâmetro soluções apresentadas por tradutores humanos no corpus. Sugestões são feitas quanto a possíveis melhorias dos sistemas existentes com base em corpus. Alguns aspectos da abordagem de corpus são comparados com os princípios das presentes abordagens de tradução de máquina, numa tentativa de enriquecer a discussão sobre as atuais tendências nesta área

    Reasoning about alternative forms is costly:The processing of null and overt pronouns in Italian using pupillary responses

    Get PDF
    Different words generally have different meanings. However, some words seemingly share similar meanings. An example are null and overt pronouns in Italian, which both refer to an individual in the discourse. Is the interpretation and processing of a form affected by the existence of another form with a similar meaning? With a pupillary response study, we show that null and overt pronouns are processed differently. Specifically, null pronouns are found to be less costly to process than overt pronouns. We argue that this difference is caused by an additional reasoning step that is needed to process marked overt pronouns but not unmarked null pronouns. A comparison with data from Dutch, a language with overt but no null pronouns, demonstrates that Italian pronouns are processed differently from Dutch pronouns. These findings suggest that the processing of a marked form is influenced by alternative forms within the same language, making its processing costly

    The Actres Project: Using corpora to assess English-Spanish translation

    Get PDF
    Assessing translation quality is generally seen as a difficult and elusive task because of the inadequacy of the tools available. The aim of this paper is to demonstrate the usefulness of a corpus-based contrastive methodology developed at the University of León (Spain) for identifying instances of translationese. The ACTRES project functional framework draws on the work by Bondarko (1991) and Chesterman (1998) and has been designed for translation-oriented cross-linguistic analysis (Rabadán et al. 2004). The long-term study focuses on those semantic areas that are typically problematic for our language pair (modality, quantification, modification, aspectuality, etc). The contrast features a two-step procedure: contrasting typical ways of expressing similar meanings in English and Spanish, and spotting differences between original Spanish and translated Spanish. First, empirical data are extracted from two monolingual ‘comparable’ corpora -The Bank of English and the Corpus de Referencia del Español Actual (CREA); secondly, these results are compared with data from a custom-made translation corpus containing English original texts and their corresponding Spanish translations. The three sets of results provide different types of useful information- i) the resources available (or absence of) in each of the languages to express a given meaning and their relative centrality, ii) the solutions favored by translators to bridge the crosslinguistic disparities and/or gaps and iii) the erroneous or non-existent uses and structures transferred from the source language into the target language. Translation practice, translator training and translation quality assessment (TQA) are the main areas that can benefit from this type of research
    corecore