7 research outputs found

    Semantic Modification of the Mitkov Algorithm for Anaphora Resolution

    Get PDF
    The article is dedicated to modern algorithm of pronominal anaphora resolution. Anaphora resolution should be considered in a wider range of problems related with language ambiguity resolution, for instance: entity recognition, reference analysis and in general case, of course, semantic analysis of natural language text. We can render conclusion from stated above that anaphora resolution is possible only on semantic level of natural language analysis. The main purpose of this work is development of semantic heuristics for finding the most probable antecedent corresponding to anaphora with analysis of sentence context. The proposed algorithm gives about 5% improvements in comparison to the standard Mitkov algorithm.Робота присвячена аналізу алгоритму розв’язання займенникової анафори. Розв’язання анафори має бути розглянуто в рамках широкого кола проблем лінгвістичної неоднозначності, наприклад: розпізнавання сутностей тексту, аналіз посилань та, в загальному випадку, семантичний аналіз текстів природною мовою. Із зазначеного вище можна зробити висновок, що розв’язання анафори можливе лише на семантичному рівні аналізу природної мови. Головною метою цієї роботи є розробка семантичної евристики для пошуку найбільш імовірного антецедента, що відповідає анафорі, із застосуванням аналізу контексту речень. Запропонований алгоритм дає покращення близько 5% порівняно зі стандартним алгоритмом Міткова.Работа посвящена анализу алгоритма решения местоименной анафоры. Решение анафоры должно быть рассмотрено в рамках широкого круга проблем лингвистической неоднозначности, например: распознавание сущностей текста, анализ ссылок и, в общем случае, семантический анализ текстов на естественном языке. Из указанного выше можно сделать вывод, что решение анафоры возможно только на семантическом уровне анализа естественного языка. Главной целью этой работы является разработка семантической эвристики для поиска наиболее вероятного антецедента, соответствующего анафоре, с использованием анализа контекста предложений. Предложенная модификация алгоритма дает улучшение около 5% по сравнению со стандартным алгоритмом Миткова

    Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

    Get PDF
    This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated

    Anaphora Resolution for Biomedical Literature by Exploiting Multiple Resources

    Full text link

    A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method

    Get PDF
    This paper describes a new, advanced and completely revamped version of Mitkov's knowledge-poor approach to pronoun resolution. In contrast to most anaphora resolution approaches, the new system, referred to as MARS, operates in fully automatic mode. It benefits from purpose-built programs for identifying occurrences of non-nominal anaphora (including pleonastic pronouns) and for recognition of animacy, and employs genetic algorithms to achieve optimal performance. The paper features extensive evaluation and discusses important evaluation issues in anaphora resolution

    Linguistics parameters for zero anaphora resolution

    Get PDF
    Dissertação de mest., Natural Language Processing and Human Language Technology, Univ. do Algarve, 2009This dissertation describes and proposes a set of linguistically motivated rules for zero anaphora resolution in the context of a natural language processing chain developed for Portuguese. Some languages, like Portuguese, allow noun phrase (NP) deletion (or zeroing) in several syntactic contexts in order to avoid the redundancy that would result from repetition of previously mentioned words. The co-reference relation between the zeroed element and its antecedent (or previous mention) in the discourse is here called zero anaphora (Mitkov, 2002). In Computational Linguistics, zero anaphora resolution may be viewed as a subtask of anaphora resolution and has an essential role in various Natural Language Processing applications such as information extraction, automatic abstracting, dialog systems, machine translation and question answering. The main goal of this dissertation is to describe the grammatical rules imposing subject NP deletion and referential constraints in the Brazilian Portuguese, in order to allow a correct identification of the antecedent of the deleted subject NP. Some of these rules were then formalized into the Xerox Incremental Parser or XIP (Ait-Mokhtar et al., 2002: 121-144) in order to constitute a module of the Portuguese grammar (Mamede et al. 2010) developed at Spoken Language Laboratory (L2F). Using this rule-based approach we expected to improve the performance of the Portuguese grammar namely by producing better dependency structures with (reconstructed) zeroed NPs for the syntactic-semantic interface. Because of the complexity of the task, the scope of this dissertation had to be limited: (a) subject NP deletion; b) within sentence boundaries and (c) with an explicit antecedent; besides, (d) rules were formalized based solely on the results of the shallow parser (or chunks), that is, with minimal syntactic (and no semantic) knowledge. A corpus of different text genres was manually annotated for zero anaphors and other zero-shaped, usually indefinite, subjects. The rule-based approached is evaluated and results are presented and discussed

    Advances in automatic terminology processing: methodology and applications in focus

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.The information and knowledge era, in which we are living, creates challenges in many fields, and terminology is not an exception. The challenges include an exponential growth in the number of specialised documents that are available, in which terms are presented, and the number of newly introduced concepts and terms, which are already beyond our (manual) capacity. A promising solution to this ‘information overload’ would be to employ automatic or semi-automatic procedures to enable individuals and/or small groups to efficiently build high quality terminologies from their own resources which closely reflect their individual objectives and viewpoints. Automatic terminology processing (ATP) techniques have already proved to be quite reliable, and can save human time in terminology processing. However, they are not without weaknesses, one of which is that these techniques often consider terms to be independent lexical units satisfying some criteria, when terms are, in fact, integral parts of a coherent system (a terminology). This observation is supported by the discussion of the notion of terms and terminology and the review of existing approaches in ATP presented in this thesis. In order to overcome the aforementioned weakness, we propose a novel methodology in ATP which is able to extract a terminology as a whole. The proposed methodology is based on knowledge patterns automatically extracted from glossaries, which we considered to be valuable, but overlooked resources. These automatically identified knowledge patterns are used to extract terms, their relations and descriptions from corpora. The extracted information can facilitate the construction of a terminology as a coherent system. The study also aims to discuss applications of ATP, and describes an experiment in which ATP is integrated into a new NLP application: multiplechoice test item generation. The successful integration of the system shows that ATP is a viable technology, and should be exploited more by other NLP applications

    The Genitive Ratio and its Applications

    Get PDF
    The genitive ratio (GR) is a novel method of classifying nouns as animate, concrete or abstract. English has two genitive (possessive) constructions: possessive-s (the boy's head) and possessive-of (the head of the boy). There is compelling evidence that preference for possessive-s is strongly influenced by the possessor's animacy. A corpus analysis that counts each genitive construction in three conditions (definite, indefinite and no article) confirms that occurrences of possessive-s decline as the animacy hierarchy progresses from animate through concrete to abstract. A computer program (Animyser) is developed to obtain results-counts from phrase-searches of Wikipedia that provide multiple genitive ratios for any target noun. Key ratios are identified and algorithms developed, with specific applications achieving classification accuracies of over 80%. The algorithms, based on logistic regression, produce a score of relative animacy that can be applied to individual nouns or to texts. The genitive ratio is a tool with potential applications in any research domain where the relative animacy of language might be significant. Three such applications exemplify that. Combining GR analysis with other factors might enhance established co-reference (anaphora) resolution algorithms. In sentences formed from pairings of animate with concrete or abstract nouns, the animate noun is usually salient, more likely to be the grammatical subject or thematic agent, and to co-refer with a succeeding pronoun or noun-phrase. Two experiments, online sentence production and corpus-based, demonstrate that the GR algorithm reliably predicts the salient noun. Replication of the online experiment in Italian suggests that the GR might be applied to other languages by using English as a 'bridge'. In a mental health context, studies have indicated that Alzheimer's patients' language becomes progressively more concrete; depressed patients' language more abstract. Analysis of sample texts suggests that the GR might monitor the prognosis of both illnesses, facilitating timely clinical interventions
    corecore