33 research outputs found

    Using Zero Anaphora Resolution to Improve Text Categorization

    Get PDF

    Gender and Animacy Knowledge Discovery from Web-Scale N-Grams for Unsupervised Person Mention Detection

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction

    Get PDF
    In this paper we analyze the effectiveness of using linguistic knowledge from coreference and anaphora resolution for improving the performance for supervised keyphrase extraction. In order to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and evaluate its performance on a standard dataset using different machine learning algorithms. Then, we consider new sets of features by adding combinations of the linguistic features we propose and we evaluate the new performance of the system. We also use anaphora and coreference resolution to transform the documents, trying to simulate the cohesion process performed by the human mind. We found that our approach has a slightly positive impact on the performance of automatic keyphrase extraction, in particular when considering the ranking of the results

    Unsupervised learning of contextual role knowledge for coreference resolution

    Get PDF
    Journal ArticleWe present a coreference resolver called BABAR that uses contextual role knowledge to evaluate possible antecedents for an anaphor. BABAR uses information extraction patterns to identify contextual roles and creates four contextual role knowledge sources using unsupervised learning. These knowledge sources determine whether the contexts surrounding an anaphor and antecedent are compatible. BABAR applies a Dempster-Shafer probabilistic model to make resolutions based on evidence from the contextual role knowledge sources as well as general knowledge sources. Experiments in two domains showed that the contextual role knowledge improved coreference performance, especially on pronouns

    Detecting Bridge Anaphora

    Get PDF
    The paper presents one of most important issues in natural language processing (NLP), namely the automated recognition of semantic relations (in this case, bridge anaphora). In this sense, we propose to recognize automatically, as accurately as possible, this type of relations in a literary corpus (the novel Quo Vadis), knowing that the diversity and complexity of relations between entities is impressive. Furthermore, we defined and classified the bridge anaphora type relations based on annotation conventions. In order to achieve the main goal, we developed a computational instrument, BAT (Bridge Anaphora Tool), currently still in a test (and implicitly an improvable) version. This study is intended to help especially specialists and researchers in the field of natural language processing, linguists, but not only

    Benchmarking natural-language parsers for biological applications using dependency graphs

    Get PDF
    BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques

    Review of coreference resolution in English and Persian

    Full text link
    Coreference resolution (CR) is one of the most challenging areas of natural language processing. This task seeks to identify all textual references to the same real-world entity. Research in this field is divided into coreference resolution and anaphora resolution. Due to its application in textual comprehension and its utility in other tasks such as information extraction systems, document summarization, and machine translation, this field has attracted considerable interest. Consequently, it has a significant effect on the quality of these systems. This article reviews the existing corpora and evaluation metrics in this field. Then, an overview of the coreference algorithms, from rule-based methods to the latest deep learning techniques, is provided. Finally, coreference resolution and pronoun resolution systems in Persian are investigated.Comment: 44 pages, 11 figures, 5 table

    Sistem za razreševanje koreferenc pri analizi slovenskih besedil in možnosti njegove uporabe

    Get PDF
    Razreševanje koreferenc je pomemben del jezikovnih tehnologij, vendar za slovenščino ta tehnologija še ni bila razvita. Obstajajo različne vrste koreferenc, članek se osredotoča predvsem na anafore pri osebnih zaimkih. Uporabljenih je bilo sedem metod razreševanja, ki se med seboj dopolnjujejo, najpomembnejša temelji na metodah na osnovi aktivacije. Prvi rezultati so obetavni, za podrobnejšo analizo delovanja pa bo potreben korpus z označenimi primeri. Razreševanje koreferenc je bilo uporabljeno tudi v sistemu za odgovarjanje na vprašanja Piflar, ki zna s tem odgovoriti na več vprašanj, ker mu uspe nadomestiti osebne zaimke, hkrati pa je bil Piflar dopolnjen še z drugimi dodatki, npr. z odgovarjanjem na posamične stavčne člene in na trdilne povedi, izboljšano pa je bilo tudi tvorjenje dolgih odgovorov pri odločevalnih vprašanjih. Razreševanje koreferenc je izboljšalo tudi delovanje strojnega prevajalnika Presis, in sicer pri določanju spola osebnih zaimkov in pri razdvoumljanju prilastkovih odvisnikov

    Pattern Based Information Extraction System in Business News Articles

    Get PDF
    Business news journals provide a rich resource of business events, which enable domain experts to further understand the spatio-temporal changes occur among a set of firms and people. However, extracting structured data from journal resource that is text-based and unstructured is a non-trivial challenge. This project designs and implements a Business Information Extraction System, which combines advanced natural language processing (NLP) tools and knowledge-based extraction patterns to process and extract information of target business event from news journals automatically. The performance evaluation on the proposed system suggests that IE techniques works well on business event extraction and it is promising to apply the technique to extract more types of business events.Master of Science in Information Scienc
    corecore