Search CORE

5 research outputs found

Single document keywords extraction in Bahasa Indonesia using phrase chunking

Author: Nurwidyantoro Arif
Trisna I Nyoman Prayana
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/08/2020
Field of study

Keywords help readers to understand the idea of a document quickly. Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Intégration des plongements de mots dans les méthodes, supervisées et non supervisées, d'extraction automatique de mots clés

Author: Hary Razakasoa
Michel Rajoelina
Mothe Josiane
Ramiandrisoa Faneva
Publication venue: Veille Stratégique Scientifique et Technologique(VSST)
Publication date: 01/01/2018
Field of study

Le plongement de mots a été utilisé avec succès dans diverses applications dans les domaines de traitement de langue et de recherche d’information. Ce papier vise à analyser l’impact de l’intégration des plongements de mots dans les méthodes supervisées et non supervisées d’extraction automatique de mots clés. Les méthodes à base de graphe pour les méthodes non supervisées et les méthodes à base d’ensemble d’arbres de décision pour les méthodes supervisées sont très utilisées et étudiées compte tenu de leurs performances;nous nous concentrons donc sur celles-ci.Nous avons considéré Word2Vec [24],une méthode de plongement de mots et nous avons évalué l’impact de l’intégration du plongement de mots sur deux jeux de données qui sont des références dans la littérature.Nous avons montré qu’il n’y a pas de différence significative dans les résultats quand nous intégrons le plongement de mots dans les méthodes non supervisées à base de graphe. Pour les méthodes supervisées à base d’ensemble d’arbres de décision,l’intégration du plongement de mots améliore significativement les résultats pour trois des quatre méthodes que nous avons testées. Cet article est une extension des articles [25, 26] qui ne s’intéressaient qu’aux méthodes non supervisées

Open Archive Toulouse Archive Ouverte

Instruments and Tools to Identify Radical Textual Content

Author: Juršėnas Alfonsas
Mandravickaitė Justina
Mothe Josiane
Okon Guenter
Schweer Thomas
Ullah Md Zia
Publication venue: MDPI
Publication date: 01/01/2022
Field of study

The Internet and social networks are increasingly becoming a media of extremist propaganda. On homepages, in forums or chats, extremists spread their ideologies and world views, which are often contrary to the basic liberal democratic values of the European Union. It is not uncommon that violence is used against those of different faiths, those who think differently, and members of social minorities. This paper presents a set of instruments and tools developed to help investigators to better address hybrid security threats, i.e., threats that combine physical and cyber attacks. These tools have been designed and developed to support security authorities in identifying extremist propaganda on the Internet and classifying it in terms of its degree of danger. This concerns both extremist content on freely accessible Internet pages and content in closed chats. We illustrate the functionalities of the tools through an example related to radicalisation detection; the data used here are just a few tweets, emails propaganda, and darknet posts. This work was supported by the EU granted PREVISION (Prediction and Visual Intelligence for Security Intelligence) project

Scientific Publications of the University of Toulouse II Le Mirail

Directory of Open Access Journals

Repository@Napier

Automatic keyphrase extraction using graph-based methods

Author: Boudin Florian
Kim Su Nam
Ramiandrisoa Faneva
Wan Xiaojun
Wan Xiaojun
Publication venue: HAL CCSD
Publication date: 01/01/2018
Field of study

International audienceThis paper analyses various unsupervised automatic keyphrase extraction methods based on graphs as well as the impact of word embedding. Evaluation is made on three datasets. We show that there is no differences when using word embedding and when not using it

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte