Search CORE

2,179 research outputs found

Key Phrase Extraction of Lightly Filtered Broadcast News

Author: Carbonell Jaime
de Matos David Martins
Gershman Anatole
Marujo Luis
Neto João P.
Ribeiro Ricardo
Publication venue
Publication date: 01/01/2012
Field of study

This paper explores the impact of light filtering on automatic key phrase extraction (AKE) applied to Broadcast News (BN). Key phrases are words and expressions that best characterize the content of a document. Key phrases are often used to index the document or as features in further processing. This makes improvements in AKE accuracy particularly important. We hypothesized that filtering out marginally relevant sentences from a document would improve AKE accuracy. Our experiments confirmed this hypothesis. Elimination of as little as 10% of the document sentences lead to a 2% improvement in AKE precision and recall. AKE is built over MAUI toolkit that follows a supervised learning approach. We trained and tested our AKE method on a gold standard made of 8 BN programs containing 110 manually annotated news stories. The experiments were conducted within a Multimedia Monitoring Solution (MMS) system for TV and radio news/programs, running daily, and monitoring 12 TV and 4 radio channels.Comment: In 15th International Conference on Text, Speech and Dialogue (TSD 2012

arXiv.org e-Print Archive

Repositório Institucional do ISCTE-IUL

Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation

Author: Peña-Sarracén Gretel Liz de la
Rosso Paolo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/08/2021
Field of study

[EN] The proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets.Peña-Sarracén, GLDL.; Rosso, P. (2021). Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation. Personal and Ubiquitous Computing. 1-13. https://doi.org/10.1007/s00779-021-01605-511

RiuNet