27 research outputs found
Static and Dynamic Aspects of Scientific Collaboration Networks
Collaboration networks arise when we map the connections between scientists
which are formed through joint publications. These networks thus display the
social structure of academia, and also allow conclusions about the structure of
scientific knowledge. Using the computer science publication database DBLP, we
compile relations between authors and publications as graphs and proceed with
examining and quantifying collaborative relations with graph-based methods. We
review standard properties of the network and rank authors and publications by
centrality. Additionally, we detect communities with modularity-based
clustering and compare the resulting clusters to a ground-truth based on
conferences and thus topical similarity. In a second part, we are the first to
combine DBLP network data with data from the Dagstuhl Seminars: We investigate
whether seminars of this kind, as social and academic events designed to
connect researchers, leave a visible track in the structure of the
collaboration network. Our results suggest that such single events are not
influential enough to change the network structure significantly. However, the
network structure seems to influence a participant's decision to accept or
decline an invitation.Comment: ASONAM 2012: IEEE/ACM International Conference on Advances in Social
Networks Analysis and Minin
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
A smart local moving algorithm for large-scale modularity-based community detection
We introduce a new algorithm for modularity-based community detection in
large networks. The algorithm, which we refer to as a smart local moving
algorithm, takes advantage of a well-known local moving heuristic that is also
used by other algorithms. Compared with these other algorithms, our proposed
algorithm uses the local moving heuristic in a more sophisticated way. Based on
an analysis of a diverse set of networks, we show that our smart local moving
algorithm identifies community structures with higher modularity values than
other algorithms for large-scale modularity optimization, among which the
popular 'Louvain algorithm' introduced by Blondel et al. (2008). The
computational efficiency of our algorithm makes it possible to perform
community detection in networks with tens of millions of nodes and hundreds of
millions of edges. Our smart local moving algorithm also performs well in small
and medium-sized networks. In short computing times, it identifies community
structures with modularity values equally high as, or almost as high as, the
highest values reported in the literature, and sometimes even higher than the
highest values found in the literature
A Constrained Power Method for Community Detection in Complex Networks
For an undirected complex network made up with vertices and edges, we developed a fast computing algorithm that divides vertices into different groups by maximizing the standard “modularity” measure of the resulting partitions. The algorithm is based on a simple constrained power method which maximizes a quadratic objective function while satisfying given linear constraints. We evaluated the performance of the algorithm and compared it with a number of state-of-the-art solutions. The new algorithm reported both high optimization quality and fast running speed, and thus it provided a practical tool for community detection and network structure analysis
A new methodology for constructing a publication-level classification system of science
Classifying journals or publications into research areas is an essential
element of many bibliometric analyses. Classification usually takes place at
the level of journals, where the Web of Science subject categories are the most
popular classification system. However, journal-level classification systems
have two important limitations: They offer only a limited amount of detail, and
they have difficulties with multidisciplinary journals. To avoid these
limitations, we introduce a new methodology for constructing classification
systems at the level of individual publications. In the proposed methodology,
publications are clustered into research areas based on citation relations. The
methodology is able to deal with very large numbers of publications. We present
an application in which a classification system is produced that includes
almost ten million publications. Based on an extensive analysis of this
classification system, we discuss the strengths and the limitations of the
proposed methodology. Important strengths are the transparency and relative
simplicity of the methodology and its fairly modest computing and memory
requirements. The main limitation of the methodology is its exclusive reliance
on direct citation relations between publications. The accuracy of the
methodology can probably be increased by also taking into account other types
of relations, for instance based on bibliographic coupling
Multilevel refinement based on neighborhood similarity
The multilevel graph partitioning strategy aims to reduce the computational cost of the partitioning algorithm by applying it on a coarsened version of the original graph. This strategy is very useful when large-scale networks are analyzed. To improve the multilevel solution, refinement algorithms have been used in the uncorsening phase. Typical refinement algorithms exploit network properties, for example minimum cut or modularity, but they do not exploit features from domain specific networks. For instance, in social networks partitions with high clustering coefficient or similarity between vertices indicate a better solution. In this paper, we propose a refinement algorithm (RSim) which is based on neighborhood similarity. We compare RSim with: 1. two algorithms from the literature and 2. one baseline strategy, on twelve real networks. Results indicate that RSim is competitive with methods evaluated for general domains, but for social networks it surpasses the competing refinement algorithms.CNPq (grant 151836-/2013-2)FAPESP (grants 2011/22749-8, 11/20451-1 and 2013/12191-5)CAPE