4,970 research outputs found
Bipartite graph for topic extraction
This article presents a bipartite graph propagation method to be applied to different tasks in the machine learning unsupervised domain, such as topic extraction and clustering. We introduce the objectives and hypothesis that motivate the use of graph based method, and we give the intuition of the proposed Bipartite Graph Propagation Algorithm. The contribution of this study is the development of new method that allows the use of heuristic knowledge to discover topics in textual data easier than it is possible in the traditional mathematical formalism based on Latent Dirichlet Allocation (LDA). Initial experiments demonstrate that our Bipartite Graph Propagation algorithm return good results in a static context (offline algorithm). Now, our research is focusing on big amount of data and dynamic context (online algorithm).São Paulo Research Foundation (FAPESP) (proj. number 2011/23689-9
A Framework for Comparing Groups of Documents
We present a general framework for comparing multiple groups of documents. A
bipartite graph model is proposed where document groups are represented as one
node set and the comparison criteria are represented as the other node set.
Using this model, we present basic algorithms to extract insights into
similarities and differences among the document groups. Finally, we demonstrate
the versatility of our framework through an analysis of NSF funding programs
for basic research.Comment: 6 pages; 2015 Conference on Empirical Methods in Natural Language
Processing (EMNLP '15
Identifying Overlapping and Hierarchical Thematic Structures in Networks of Scholarly Papers: A Comparison of Three Approaches
We implemented three recently proposed approaches to the identification of
overlapping and hierarchical substructures in graphs and applied the
corresponding algorithms to a network of 492 information-science papers coupled
via their cited sources. The thematic substructures obtained and overlaps
produced by the three hierarchical cluster algorithms were compared to a
content-based categorisation, which we based on the interpretation of titles
and keywords. We defined sets of papers dealing with three topics located on
different levels of aggregation: h-index, webometrics, and bibliometrics. We
identified these topics with branches in the dendrograms produced by the three
cluster algorithms and compared the overlapping topics they detected with one
another and with the three pre-defined paper sets. We discuss the advantages
and drawbacks of applying the three approaches to paper networks in research
fields.Comment: 18 pages, 9 figure
Overlapping Community Detection Optimization and Nash Equilibrium
Community detection using both graphs and social networks is the focus of
many algorithms. Recent methods aimed at optimizing the so-called modularity
function proceed by maximizing relations within communities while minimizing
inter-community relations.
However, given the NP-completeness of the problem, these algorithms are
heuristics that do not guarantee an optimum. In this paper, we introduce a new
algorithm along with a function that takes an approximate solution and modifies
it in order to reach an optimum. This reassignment function is considered a
'potential function' and becomes a necessary condition to asserting that the
computed optimum is indeed a Nash Equilibrium. We also use this function to
simultaneously show partitioning and overlapping communities, two detection and
visualization modes of great value in revealing interesting features of a
social network. Our approach is successfully illustrated through several
experiments on either real unipartite, multipartite or directed graphs of
medium and large-sized datasets.Comment: Submitted to KD
- …