529 research outputs found

    Anaphora resolution for Arabic machine translation :a case study of nafs

    Get PDF
    PhD ThesisIn the age of the internet, email, and social media there is an increasing need for processing online information, for example, to support education and business. This has led to the rapid development of natural language processing technologies such as computational linguistics, information retrieval, and data mining. As a branch of computational linguistics, anaphora resolution has attracted much interest. This is reflected in the large number of papers on the topic published in journals such as Computational Linguistics. Mitkov (2002) and Ji et al. (2005) have argued that the overall quality of anaphora resolution systems remains low, despite practical advances in the area, and that major challenges include dealing with real-world knowledge and accurate parsing. This thesis investigates the following research question: can an algorithm be found for the resolution of the anaphor nafs in Arabic text which is accurate to at least 90%, scales linearly with text size, and requires a minimum of knowledge resources? A resolution algorithm intended to satisfy these criteria is proposed. Testing on a corpus of contemporary Arabic shows that it does indeed satisfy the criteria.Egyptian Government

    Review of coreference resolution in English and Persian

    Full text link
    Coreference resolution (CR) is one of the most challenging areas of natural language processing. This task seeks to identify all textual references to the same real-world entity. Research in this field is divided into coreference resolution and anaphora resolution. Due to its application in textual comprehension and its utility in other tasks such as information extraction systems, document summarization, and machine translation, this field has attracted considerable interest. Consequently, it has a significant effect on the quality of these systems. This article reviews the existing corpora and evaluation metrics in this field. Then, an overview of the coreference algorithms, from rule-based methods to the latest deep learning techniques, is provided. Finally, coreference resolution and pronoun resolution systems in Persian are investigated.Comment: 44 pages, 11 figures, 5 table

    A Survey on Semantic Processing Techniques

    Full text link
    Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

    Aprendizagem à distância de anáfora em inglês e espanhol como línguas estrangeiras

    Get PDF
    A presente tese de doutoramento investiga a aprendizagem à distância de anáfora em inglês e espanhol como línguas estrangeiras. Analisa-se como falantes nativos de português, aprendizes de inglês ou espanhol, compreendem e produzem anáforas com antecedentes nominais em textos escritos e como diferentes modalidades de ensino à distância podem contribuir para a aprendizagem deste mecanismo discursivo. Ao todo, foram escritos 11 artigos, distribuídos em 4 seções. A primeira seção tem como foco a investigação da resolução de ambiguidade com base em um questionário online distribuído a aprendizes e falantes nativos de português, inglês e espanhol. Enquanto o primeiro texto foi um estudo-piloto realizado em Portugal, o segundo incluiu dados do Brasil, e o terceiro foi escrito após a coleta ser concluída. Nos questionários, foi possível controlar diversas variáveis para analisar como os falantes resolviam a ambiguidade anafórica. A segunda seção destina-se à revisão da literatura sobre o ensino-aprendizagem da anáfora, as teorias e métodos voltados ao ensino de línguas, e as diferentes modalidades de ensino. Estes estudos permitiram a elaboração conceitual do experimento realizado posteriormente. Finalmente, a terceira seção da tese trata do experimento realizado, que consistiu na oferta de um curso sobre anáfora nas modalidades de ensino à distância síncrona e assíncrona, com acompanhamento da aprendizagem ao longo do tempo. O primeiro artigo explica como o curso foi planejado; o segundo apresenta os resultados dos grupos nos testes de compreensão; e o terceiro avalia o curso qualitativamente. A quarta seção apresenta os corpora de aprendizagem compilados, BRANEN e BRANES, e a análise das relações anafóricas produzidas pelos estudantes ao longo de quatro testes (um pré-teste, um teste intermédio, um teste imediatamente final, e um teste de retenção após um mês). A tese conclui-se com uma sinopse dos resultados obtidos, sua discussão e uma conclusão perspectivando linhas de investigação futuras.This doctoral thesis investigates the distance learning of anaphora in English and Spanish as foreign languages. It analyses how native speakers of Portuguese, learners of English or Spanish, understand and produce anaphora with nominal antecedents in written texts and how different distance learning modalities can contribute to the learning of this discursive mechanism. In total, 11 articles were written and distributed in 4 sections. The first section focuses on investigating ambiguity resolution based on an online questionnaire distributed to learners and native speakers of Portuguese, English, and Spanish. While the first paper presents a pilot study conducted in Portugal, the second included data from Brazil, and the third was written after the data collection was completed. In the questionnaires, it was possible to control several variables to analyse how speakers resolved anaphoric ambiguity. The second section reviews the literature on the teaching and learning of anaphora, the theories and methods focused on language teaching, and the different teaching modalities. These studies allowed the conceptual elaboration of the experiment carried out later. Finally, the third section of the thesis presents the experiment carried out, which consisted in offering a course on anaphora in synchronous and asynchronous distance learning modalities, with monitoring of learning over time. The first article explains how the course was planned; the second presents the groups’ results in the comprehension tests; and the third evaluated the course qualitatively. The fourth section presents the new learner corpora, BRANEN and BRANES, and the analysis of the anaphoric relations produced by the students over four tests (a pre-test, an intermediate test, an immediately final test, and a retention test after one month). The thesis ends with a synopsis of the results obtained, their discussion, and a conclusion looking towards future lines of research

    Towards Multilingual Coreference Resolution

    Get PDF
    The current work investigates the problems that occur when coreference resolution is considered as a multilingual task. We assess the issues that arise when a framework using the mention-pair coreference resolution model and memory-based learning for the resolution process are used. Along the way, we revise three essential subtasks of coreference resolution: mention detection, mention head detection and feature selection. For each of these aspects we propose various multilingual solutions including both heuristic, rule-based and machine learning methods. We carry out a detailed analysis that includes eight different languages (Arabic, Catalan, Chinese, Dutch, English, German, Italian and Spanish) for which datasets were provided by the only two multilingual shared tasks on coreference resolution held so far: SemEval-2 and CoNLL-2012. Our investigation shows that, although complex, the coreference resolution task can be targeted in a multilingual and even language independent way. We proposed machine learning methods for each of the subtasks that are affected by the transition, evaluated and compared them to the performance of rule-based and heuristic approaches. Our results confirmed that machine learning provides the needed flexibility for the multilingual task and that the minimal requirement for a language independent system is a part-of-speech annotation layer provided for each of the approached languages. We also showed that the performance of the system can be improved by introducing other layers of linguistic annotations, such as syntactic parses (in the form of either constituency or dependency parses), named entity information, predicate argument structure, etc. Additionally, we discuss the problems occurring in the proposed approaches and suggest possibilities for their improvement

    Linguistics parameters for zero anaphora resolution

    Get PDF
    Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

    On L1 Attrition and Prosody in Pronominal Anaphora Resolution

    Get PDF
    This thesis is a collection of four studies on pronominal anaphora resolution with a focus on first language (L1) attrition and prosody. In Study I, we explored the temporariness of attrition effects on anaphora resolution in L1 Italian speakers who moved to Sweden after puberty (i.e., late bilinguals). An experimental group of 20 late Italian-Swedish bilinguals and a control group of 21 Italian monolinguals completed a self-paced interpretation task twice, and we measured response preferences and response times. In Study II, we investigated how L1 Italian and L1 Swedish speakers use pause features and prominence cues to resolve globally ambiguous anaphora sentences, and whether their patterns in the use of prosody mirror the divergent coreference patterns in the two languages. 28 L1 Italian speakers and 28 L1 Swedish speakers completed a speech production task, in which we analyzed the inter-clausal pause length and the pronoun’s degree of prosodic prominence, and a control interpretation task, in which we considered response preferences. Study III represents a continuation of Study II, since we examined a group of 18 late Italian-Swedish bilinguals, who completed the same experimental tasks of Study II. Study IV is a theoretical investigation, in which we discussed previous inconsistent findings on anaphora resolution in light of the interplay between hierarchical structure and linear order of a sentence. The results of the four studies suggest, first, that anaphora resolution may also affect null pronouns, and that task-learning effects should be taken into account for further research on L1 re-immersion. Second, they suggest that inter-clausal pause and prosodic prominence of pronouns are likely to break the canonical coreference pattern, both in a null subject language and in a non-null subject language. Third, the findings also reveal that L1 attrition affects prominence patterns and pause features in pronoun resolution. In particular, the longer the residence in the foreign language (FL) environment, the higher the probability that late bilinguals adapt to the FL patterns when they use prosody to resolve anaphora sentences. Fourth, both monolinguals and bilinguals are sensitive to the interplay between hierarchical structure and linear order of anaphora. However, they employ different strategies to interpret an anaphora sentence, in which hierarchical structure and linear order favor different antecedents. The implications of the findings are discussed in light of the role of processing and cross-linguistic influence (CLI) in L1 attrition, as well as in light of the use of prosodic cues to resolve an anaphoric reference, both in relation to the Null Subject Parameter and in relation to L1 attrition

    Anaphoric resolution of zero pronouns in Chinese in translation and reading comprehension

    Get PDF
    The primary aim of the thesis is to investigate some of the processes of reading Chinese text by means of comparing and analysing approximately 100 parallel translations of four texts from Chinese to English. The translations are answers to A Level examination questions. The focus of the investigation is interpretation of the zero pronoun, a common phenomenon in Chinese, which often requires explicitation when translated into English. The secondary aim is to show how translation gives evidence of comprehension, as shown by the variation in interpretation of zero pronouns. The thesis reviews relevant psycholinguistic research into reading, particularly reading of Chinese text. This is followed by reviews of relevant research into translation as a reading activity, and a discussion of its role in language teaching and testing.The core of the thesis is the discussion of the zero pronoun in Chinese, including discussion of anaphoric choice - the writer's decision on when to use zero in preference to an explicit anaphoric form - and of anaphoric resolution - how a reader decides what a zero pronoun refers to. Anaphoric resolution may be problematic for less experienced readers of Chinese owing to its lack of rich morphological inflection which, in other languages, provides the reader with information. Some of the key ideas on anaphoric choice and resolution are then applied to the analysis of the data in the parallel translations. It would appear that factors in Chinese texts which have an effect on comprehending zero pronouns are antecedent distance, topic persistence, abstraction, multiplicity of arguments and the meaning of the verb. Characteristics of the reader which may affect comprehension of the zero pronoun include personal schemata which may lead to elaborative inferences. On the basis of the data I suggest that mark schemes could be devised on a scalar system encompassing optimal solution, proximal solution and nonsolution, which might help to solve the problem of variability in marking translation.A by-product of the thesis, and an avenue for further research, is the apparent close relationship between idea units, clause length, punctuation breaks and antecedent distance in Chinese texts and saccade length and working memory capacity in the reader of Chinese

    Intelligent text processing to help readers with autism

    Get PDF
    © 2018, Springer International Publishing AG. Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and technical terms. This paper presents FIRST, an innovative project which developed language technology (LT) to make documents more accessible to people with ASD. The project has produced a powerful editor which enables carers of people with ASD to prepare texts suitable for this population. Assessment of the texts generated using the editor showed that they are not less readable than those generated more slowly as a result of onerous unaided conversion and were significantly more readable than the originals. Evaluation of the tool shows that it can have a positive impact on the lives of people with ASD.Published versio
    corecore