4,968 research outputs found

    Bipartite graph for topic extraction

    Get PDF
    This article presents a bipartite graph propagation method to be applied to different tasks in the machine learning unsupervised domain, such as topic extraction and clustering. We introduce the objectives and hypothesis that motivate the use of graph based method, and we give the intuition of the proposed Bipartite Graph Propagation Algorithm. The contribution of this study is the development of new method that allows the use of heuristic knowledge to discover topics in textual data easier than it is possible in the traditional mathematical formalism based on Latent Dirichlet Allocation (LDA). Initial experiments demonstrate that our Bipartite Graph Propagation algorithm return good results in a static context (offline algorithm). Now, our research is focusing on big amount of data and dynamic context (online algorithm).São Paulo Research Foundation (FAPESP) (proj. number 2011/23689-9

    A Framework for Comparing Groups of Documents

    Full text link
    We present a general framework for comparing multiple groups of documents. A bipartite graph model is proposed where document groups are represented as one node set and the comparison criteria are represented as the other node set. Using this model, we present basic algorithms to extract insights into similarities and differences among the document groups. Finally, we demonstrate the versatility of our framework through an analysis of NSF funding programs for basic research.Comment: 6 pages; 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP '15

    Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches

    Get PDF
    We implemented three recently proposed approaches to the identification of overlapping and hierarchical substructures in graphs and applied the corresponding algorithms to a network of 492 information-science papers coupled via their cited sources. The thematic substructures obtained and overlaps produced by the three hierarchical cluster algorithms were compared to a content-based categorisation, which we based on the interpretation of titles and keywords. We defined sets of papers dealing with three topics located on different levels of aggregation: h-index, webometrics, and bibliometrics. We identified these topics with branches in the dendrograms produced by the three cluster algorithms and compared the overlapping topics they detected with one another and with the three pre-defined paper sets. We discuss the advantages and drawbacks of applying the three approaches to paper networks in research fields.Comment: 18 pages, 9 figure

    Overlapping Community Detection Optimization and Nash Equilibrium

    Full text link
    Community detection using both graphs and social networks is the focus of many algorithms. Recent methods aimed at optimizing the so-called modularity function proceed by maximizing relations within communities while minimizing inter-community relations. However, given the NP-completeness of the problem, these algorithms are heuristics that do not guarantee an optimum. In this paper, we introduce a new algorithm along with a function that takes an approximate solution and modifies it in order to reach an optimum. This reassignment function is considered a 'potential function' and becomes a necessary condition to asserting that the computed optimum is indeed a Nash Equilibrium. We also use this function to simultaneously show partitioning and overlapping communities, two detection and visualization modes of great value in revealing interesting features of a social network. Our approach is successfully illustrated through several experiments on either real unipartite, multipartite or directed graphs of medium and large-sized datasets.Comment: Submitted to KD
    • …
    corecore