10,196 research outputs found
Concept Extraction and Clustering for Topic Digital Library Construction
This paper is to introduce a new approach to build
topic digital library using concept extraction and
document clustering. Firstly, documents in a special
domain are automatically produced by document
classification approach. Then, the keywords of each
document are extracted using the machine learning
approach. The keywords are used to cluster the
documents subset. The clustered result is the taxonomy
of the subset. Lastly, the taxonomy is modified to the
hierarchical structure for user navigation by manual
adjustments. The topic digital library is constructed
after combining the full-text retrieval and hierarchical
navigation function
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
On the Stability of Community Detection Algorithms on Longitudinal Citation Data
There are fundamental differences between citation networks and other classes
of graphs. In particular, given that citation networks are directed and
acyclic, methods developed primarily for use with undirected social network
data may face obstacles. This is particularly true for the dynamic development
of community structure in citation networks. Namely, it is neither clear when
it is appropriate to employ existing community detection approaches nor is it
clear how to choose among existing approaches. Using simulated data, we attempt
to clarify the conditions under which one should use existing methods and which
of these algorithms is appropriate in a given context. We hope this paper will
serve as both a useful guidepost and an encouragement to those interested in
the development of more targeted approaches for use with longitudinal citation
data.Comment: 17 pages, 7 figures, presenting at Applications of Social Network
Analysis 2009, ETH Zurich Edit, August 17, 2009: updated abstract, figures,
text clarification
Taxonomy and clustering in collaborative systems: the case of the on-line encyclopedia Wikipedia
In this paper we investigate the nature and structure of the relation between
imposed classifications and real clustering in a particular case of a
scale-free network given by the on-line encyclopedia Wikipedia. We find a
statistical similarity in the distributions of community sizes both by using
the top-down approach of the categories division present in the archive and in
the bottom-up procedure of community detection given by an algorithm based on
the spectral properties of the graph. Regardless the statistically similar
behaviour the two methods provide a rather different division of the articles,
thereby signaling that the nature and presence of power laws is a general
feature for these systems and cannot be used as a benchmark to evaluate the
suitability of a clustering method.Comment: 5 pages, 3 figures, epl2 styl
- …