20 research outputs found

    A novel, Language-Independent Keyword Extraction method

    Get PDF
    Obtaining the most representative set of words in a document is a very significant task, since it allows characterizing the document and simplifies search and classification activities. This paper presents a novel method, called LIKE, that offers the ability of automatically extracting keywords from a document regardless of the language used in it. To do so, it uses a three-stage process: the first stage identifies the most representative terms, the second stage builds a numeric representation that is appropriate for those terms, and the third one uses a feed-forward neural network to obtain a predictive model. To measure the efficacy of the LIKE method, the articles published by the Workshop of Computer Science Researchers (WICC) in the last 14 years (1999-2012) were used. The results obtained show that LIKE is better than the KEA method, which is one of the most widely mentioned solutions in literature about this topic.X Workshop bases de datos y minería de datosRed de Universidades con Carreras en Informática (RedUNCI

    Suggesting new words to extract keywords from title and abstract

    Get PDF
    When talking about the fundamentals of writing research papers, we find that keywords are still present in most research papers, but that does not mean that they exist in all of them, we can find papers that do not contain keywords. Keywords are those words or phrases that accurately reflect the content of the research paper. Keywords are an exact abbreviation of what the research carries in its content. The right keywords may increase the chance of finding the article or research paper and chances of reaching more people who should reach them. The importance of keywords and the essence of the research and address is mainly to attract these highly specialized and highly influential writers in their fields and who specialize in reading what holds the appropriate characteristics but they do not read and cannot read everything. In this paper, we extract new keywords by suggesting a set of words, these words were suggested according to the many mentioned in the researches with multiple disciplines in the field of computer. In our system, we take a number of words (as many as specified in the program) that come before the proposed words and consider it as new keywords. This system proved to be effective in finding keywords that correspond to some extent with the keywords developed by the author in his research

    Exploring differential topic models for comparative summarization of scientific papers

    Get PDF
    This paper investigates differential topic models (dTM) for summarizing the differences among document groups. Starting from a simple probabilistic generative model, we propose dTM-SAGE that explicitly models the deviations on group-specific word distributions to indicate how words are used differentially across different document groups from a background word distribution. It is more effective to capture unique characteristics for comparing document groups. To generate dTM-based comparative summaries, we propose two sentence scoring methods for measuring the sentence discriminative capacity. Experimental results on scientific papers dataset show that our dTM-based comparative summarization methods significantly outperform the generic baselines and the state-of-the-art comparative summarization methods under ROUGE metrics

    Effectively Grouping Named Entities From Click- Through Data Into Clusters Of Generated Keywords1

    Get PDF
    Many studies show that named entities are closely related to users\u27 search behaviors, which brings increasing interest in studying named entities in search logs recently. This paper addresses the problem of forming fine grained semantic clusters of named entities within a broad domain such as “company”, and generating keywords for each cluster, which help users to interpret the embedded semantic information in the cluster. By exploring contexts, URLs and session IDs as features of named entities, a three-phase approach proposed in this paper first disambiguates named entities according to the features. Then it properly weights the features with a novel measurement, calculates the semantic similarity between named entities with the weighted feature space, and clusters named entities accordingly. After that, keywords for the clusters are generated using a text-oriented graph ranking algorithm. Each phase of the proposed approach solves problems that are not addressed in existing works, and experimental results obtained from a real click through data demonstrate the effectiveness of the proposed approach

    Finding influential users of web event in social media

    Get PDF
    Users of social media have different influences on the evolution of a Web event. Finding influential users could benefit such information services as recommendation and market analysis. However, most of the existing methods are only based on social networks of users or user behaviors while the role of the contents contributed by users in social media is ignored. In fact, a Web event evolves with both user behaviors and the contents. This paper proposes an approach to find influential users by extracting user behavior network and association network of words within the contents and then uses PageRank algorithm and HITS algorithm to calculate the influence of users on the integration of two networks. The proposed approach is effective on several real-world datasets

    Intégration des plongements de mots dans les méthodes, supervisées et non supervisées, d'extraction automatique de mots clés

    Get PDF
    Le plongement de mots a été utilisé avec succès dans diverses applications dans les domaines de traitement de langue et de recherche d’information. Ce papier vise à analyser l’impact de l’intégration des plongements de mots dans les méthodes supervisées et non supervisées d’extraction automatique de mots clés. Les méthodes à base de graphe pour les méthodes non supervisées et les méthodes à base d’ensemble d’arbres de décision pour les méthodes supervisées sont très utilisées et étudiées compte tenu de leurs performances;nous nous concentrons donc sur celles-ci.Nous avons considéré Word2Vec [24],une méthode de plongement de mots et nous avons évalué l’impact de l’intégration du plongement de mots sur deux jeux de données qui sont des références dans la littérature.Nous avons montré qu’il n’y a pas de différence significative dans les résultats quand nous intégrons le plongement de mots dans les méthodes non supervisées à base de graphe. Pour les méthodes supervisées à base d’ensemble d’arbres de décision,l’intégration du plongement de mots améliore significativement les résultats pour trois des quatre méthodes que nous avons testées. Cet article est une extension des articles [25, 26] qui ne s’intéressaient qu’aux méthodes non supervisées

    Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction

    Get PDF
    Keyphrases are single- or multi-word phrases that are used to describe the essential content of a document. Utilizing an external knowledge source such as WordNet is often used in keyphrase extraction methods to obtain relation information about terms and thus improves the result, but the drawback is that a sole knowledge source is often limited. This problem is identified as the coverage limitation problem. In this paper, we introduce SemCluster, a clustering-based unsupervised keyphrase extraction method that addresses the coverage limitation problem by using an extensible approach that integrates an internal ontology (i.e., WordNet) with other knowledge sources to gain a wider background knowledge. SemCluster is evaluated against three unsupervised methods, TextRank, ExpandRank, and KeyCluster, and under the F1-measure metric. The evaluation results demonstrate that SemCluster has better accuracy and computational efficiency and is more robust when dealing with documents from different domains
    corecore