6 research outputs found

    A Knowledge-Driven Approach to Classifying Object and Attribute Coreferences in Opinion Mining

    Full text link
    Classifying and resolving coreferences of objects (e.g., product names) and attributes (e.g., product aspects) in opinionated reviews is crucial for improving the opinion mining performance. However, the task is challenging as one often needs to consider domain-specific knowledge (e.g., iPad is a tablet and has aspect resolution) to identify coreferences in opinionated reviews. Also, compiling a handcrafted and curated domain-specific knowledge base for each domain is very time consuming and arduous. This paper proposes an approach to automatically mine and leverage domain-specific knowledge for classifying objects and attribute coreferences. The approach extracts domain-specific knowledge from unlabeled review data and trains a knowledgeaware neural coreference classification model to leverage (useful) domain knowledge together with general commonsense knowledge for the task. Experimental evaluation on realworld datasets involving five domains (product types) shows the effectiveness of the approach.Comment: Accepted to Proceedings of EMNLP 2020 (Findings

    SMDDH: Singleton Mention detection using Deep Learning in Hindi Text

    Full text link
    Mention detection is an important component of coreference resolution system, where mentions such as name, nominal, and pronominals are identified. These mentions can be purely coreferential mentions or singleton mentions (non-coreferential mentions). Coreferential mentions are those mentions in a text that refer to the same entities in a real world. Whereas, singleton mentions are mentioned only once in the text and do not participate in the coreference as they are not mentioned again in the following text. Filtering of these singleton mentions can substantially improve the performance of a coreference resolution process. This paper proposes a singleton mention detection module based on a fully connected network and a Convolutional neural network for Hindi text. This model utilizes a few hand-crafted features and context information, and word embedding for words. The coreference annotated Hindi dataset comprising of 3.6K sentences, and 78K tokens are used for the task. In terms of Precision, Recall, and F-measure, the experimental findings obtained are excellent

    ELECTRA for Neural Coreference Resolution in Italian

    Get PDF
    In recent years, the impact of Neural Language Models has changed every field of Natural Language Processing. In this scenario, coreference resolution has been among the least considered task, especially in language other than English. This work proposes a coreference resolution system for Italian, based on a neural end-to-end architecture integrating ELECTRA language model and trained on OntoCorefIT, a novel Italian dataset built starting from OntoNotes. Even if some approaches for Italian have been proposed in the last decade, to the best of our knowledge, this is the first neural coreference resolver aimed specifically to Italian. The performance of the system is evaluated with respect to three different metrics and also assessed by replacing ELECTRA with the widely-used BERT language model, since its usage has proven to be effective in the coreference resolution task in English. A qualitative analysis has also been conducted, showing how different grammatical categories affect performance in an inflectional and morphological-rich language like Italian. The overall results have shown the effectiveness of the proposed solution, providing a baseline for future developments of this line of research in Italian

    Tex2kor: sekuentziatik sekuentziarako euskararako korreferentzia-ebazpena

    Get PDF
    [EU]Korreferentzia-ebazpena testuko bi aipamenek mundu errealeko entitate bera erreferentziatzen dutela identi katzeari deritzo. Lan honetan, korreferentzia-ebazpena sekuentziatik sekuentziara lantzeko hurbilpen berri bat aurkezten da. Sekuentziatik sekuentziarako ataza burutzeko Transformer arkitektura neuronala erabili da. Transformerrak ikasketarako darabiltzan sekuentzien luzera mugatzeko, dokumentu etiketatuak zatitu eta elkartzeko algoritmo bat sortu da. Euskararako korreferentzia-ebazpena helburu izanik, euskararako emaitzak hobetzeko datu gehikuntzako teknikak eta BPE segmentazioa gehitu zaizkio hurbilpenari eta tex2kor sistema eraiki dugu. Testu hutsetik korreferentzia-kateak eskuratzeko sistemak, CoNLL metrikan 37,14 puntuko F1 balioa lortu du. Honenbestez, euskararako korreferentzia-ebazpenerako zeuden emaitzak hobetzerik lortu ez den arren, korreferentzia-ebazpena lantzeko hurbilpen orokor berri bat aurkeztu da.[EN]Coreference resolution is the task of identifying the mentions that refer to the same real world entity. In this work, we present a novel sequence to sequence approach for coreference resolution, for which we use a Transformer. To limit the length of the sequences for the training of the Transformer, we create an algorithm to divide and merge the labeled documents. As our aim is the coreference resolution for Basque, we added some data augmentation techniques and BPE segmentation to build our tex2kor system. The system which converts raw text into coreference-chains, gets F1 37.14 points on CoNLL metric. Therefore, although we did not improve the results of the state of the art system for coreference resolution for Basque, we present a new general approach for coreference resolution
    corecore