5 research outputs found

    Single document keywords extraction in Bahasa Indonesia using phrase chunking

    Get PDF
    Keywords help readers to understand the idea of a document quickly. Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results

    Intégration des plongements de mots dans les méthodes, supervisées et non supervisées, d'extraction automatique de mots clés

    Get PDF
    Le plongement de mots a été utilisé avec succès dans diverses applications dans les domaines de traitement de langue et de recherche d’information. Ce papier vise à analyser l’impact de l’intégration des plongements de mots dans les méthodes supervisées et non supervisées d’extraction automatique de mots clés. Les méthodes à base de graphe pour les méthodes non supervisées et les méthodes à base d’ensemble d’arbres de décision pour les méthodes supervisées sont très utilisées et étudiées compte tenu de leurs performances;nous nous concentrons donc sur celles-ci.Nous avons considéré Word2Vec [24],une méthode de plongement de mots et nous avons évalué l’impact de l’intégration du plongement de mots sur deux jeux de données qui sont des références dans la littérature.Nous avons montré qu’il n’y a pas de différence significative dans les résultats quand nous intégrons le plongement de mots dans les méthodes non supervisées à base de graphe. Pour les méthodes supervisées à base d’ensemble d’arbres de décision,l’intégration du plongement de mots améliore significativement les résultats pour trois des quatre méthodes que nous avons testées. Cet article est une extension des articles [25, 26] qui ne s’intéressaient qu’aux méthodes non supervisées

    Instruments and Tools to Identify Radical Textual Content

    Get PDF
    The Internet and social networks are increasingly becoming a media of extremist propaganda. On homepages, in forums or chats, extremists spread their ideologies and world views, which are often contrary to the basic liberal democratic values of the European Union. It is not uncommon that violence is used against those of different faiths, those who think differently, and members of social minorities. This paper presents a set of instruments and tools developed to help investigators to better address hybrid security threats, i.e., threats that combine physical and cyber attacks. These tools have been designed and developed to support security authorities in identifying extremist propaganda on the Internet and classifying it in terms of its degree of danger. This concerns both extremist content on freely accessible Internet pages and content in closed chats. We illustrate the functionalities of the tools through an example related to radicalisation detection; the data used here are just a few tweets, emails propaganda, and darknet posts. This work was supported by the EU granted PREVISION (Prediction and Visual Intelligence for Security Intelligence) project

    Automatic keyphrase extraction using graph-based methods

    Get PDF
    International audienceThis paper analyses various unsupervised automatic keyphrase extraction methods based on graphs as well as the impact of word embedding. Evaluation is made on three datasets. We show that there is no differences when using word embedding and when not using it