6 research outputs found
A Knowledge-Driven Approach to Classifying Object and Attribute Coreferences in Opinion Mining
Classifying and resolving coreferences of objects (e.g., product names) and
attributes (e.g., product aspects) in opinionated reviews is crucial for
improving the opinion mining performance. However, the task is challenging as
one often needs to consider domain-specific knowledge (e.g., iPad is a tablet
and has aspect resolution) to identify coreferences in opinionated reviews.
Also, compiling a handcrafted and curated domain-specific knowledge base for
each domain is very time consuming and arduous. This paper proposes an approach
to automatically mine and leverage domain-specific knowledge for classifying
objects and attribute coreferences. The approach extracts domain-specific
knowledge from unlabeled review data and trains a knowledgeaware neural
coreference classification model to leverage (useful) domain knowledge together
with general commonsense knowledge for the task. Experimental evaluation on
realworld datasets involving five domains (product types) shows the
effectiveness of the approach.Comment: Accepted to Proceedings of EMNLP 2020 (Findings
SMDDH: Singleton Mention detection using Deep Learning in Hindi Text
Mention detection is an important component of coreference resolution system,
where mentions such as name, nominal, and pronominals are identified. These
mentions can be purely coreferential mentions or singleton mentions
(non-coreferential mentions). Coreferential mentions are those mentions in a
text that refer to the same entities in a real world. Whereas, singleton
mentions are mentioned only once in the text and do not participate in the
coreference as they are not mentioned again in the following text. Filtering of
these singleton mentions can substantially improve the performance of a
coreference resolution process. This paper proposes a singleton mention
detection module based on a fully connected network and a Convolutional neural
network for Hindi text. This model utilizes a few hand-crafted features and
context information, and word embedding for words. The coreference annotated
Hindi dataset comprising of 3.6K sentences, and 78K tokens are used for the
task. In terms of Precision, Recall, and F-measure, the experimental findings
obtained are excellent
ELECTRA for Neural Coreference Resolution in Italian
In recent years, the impact of Neural Language Models has changed every field of Natural Language Processing. In this scenario, coreference resolution has been among the least considered task, especially in language other than English. This work proposes a coreference resolution system for Italian, based on a neural end-to-end architecture integrating ELECTRA language model and trained on OntoCorefIT, a novel Italian dataset built starting from OntoNotes. Even if some approaches for Italian have been proposed in the last decade, to the best of our knowledge, this is the first neural coreference resolver aimed specifically to Italian. The performance of the system is evaluated with respect to three different metrics and also assessed by replacing ELECTRA with the widely-used BERT language model, since its usage has proven to be effective in the coreference resolution task in English. A qualitative analysis has also been conducted, showing how different grammatical categories affect performance in an inflectional and morphological-rich language like Italian. The overall results have shown the effectiveness of the proposed solution, providing a baseline for future developments of this line of research in Italian
Tex2kor: sekuentziatik sekuentziarako euskararako korreferentzia-ebazpena
[EU]Korreferentzia-ebazpena testuko bi aipamenek mundu errealeko entitate bera erreferentziatzen dutela identi katzeari deritzo. Lan honetan, korreferentzia-ebazpena sekuentziatik sekuentziara lantzeko hurbilpen berri bat aurkezten da. Sekuentziatik sekuentziarako ataza burutzeko Transformer arkitektura neuronala erabili da. Transformerrak ikasketarako darabiltzan sekuentzien luzera mugatzeko, dokumentu etiketatuak zatitu eta elkartzeko algoritmo bat sortu da. Euskararako korreferentzia-ebazpena helburu izanik, euskararako emaitzak hobetzeko datu gehikuntzako teknikak eta BPE segmentazioa gehitu zaizkio hurbilpenari eta tex2kor sistema eraiki dugu. Testu hutsetik korreferentzia-kateak eskuratzeko sistemak, CoNLL metrikan 37,14 puntuko F1 balioa lortu du. Honenbestez, euskararako korreferentzia-ebazpenerako zeuden emaitzak hobetzerik lortu ez den arren, korreferentzia-ebazpena lantzeko hurbilpen orokor berri bat aurkeztu da.[EN]Coreference resolution is the task of identifying the mentions that refer to the same real world entity. In this work, we present a novel sequence to sequence approach for coreference resolution, for which we use a Transformer. To limit the length of the sequences for the training of the Transformer, we create an algorithm to divide and merge the labeled documents. As our aim is the coreference resolution for Basque, we added some data augmentation techniques and BPE segmentation to build our tex2kor system. The system which converts raw text into coreference-chains, gets F1 37.14 points on CoNLL metric. Therefore, although we did not improve the results of the state of the art system for coreference resolution for Basque, we present a new general approach for coreference resolution