311 research outputs found
Cluster Analysis Based on Bipartite Network
Clustering data has a wide range of applications and has attracted considerable attention in data mining and artificial intelligence. However it is difficult to find a set of clusters that best fits natural partitions without any class information. In this paper, a method for detecting the optimal cluster number is proposed. The optimal cluster number can be obtained by the proposal, while partitioning the data into clusters by FCM (Fuzzy c-means) algorithm. It overcomes the drawback of FCM algorithm which needs to define the cluster number c in advance. The method works by converting the fuzzy cluster result into a weighted bipartite network and then the optimal cluster number can be detected by the improved bipartite modularity. The experimental results on artificial and real data sets show the validity of the proposed method
Link communities reveal multiscale complexity in networks
Networks have become a key approach to understanding systems of interacting
objects, unifying the study of diverse phenomena including biological organisms
and human society. One crucial step when studying the structure and dynamics of
networks is to identify communities: groups of related nodes that correspond to
functional subunits such as protein complexes or social spheres. Communities in
networks often overlap such that nodes simultaneously belong to several groups.
Meanwhile, many networks are known to possess hierarchical organization, where
communities are recursively grouped into a hierarchical structure. However, the
fact that many real networks have communities with pervasive overlap, where
each and every node belongs to more than one group, has the consequence that a
global hierarchy of nodes cannot capture the relationships between overlapping
groups. Here we reinvent communities as groups of links rather than nodes and
show that this unorthodox approach successfully reconciles the antagonistic
organizing principles of overlapping communities and hierarchy. In contrast to
the existing literature, which has entirely focused on grouping nodes, link
communities naturally incorporate overlap while revealing hierarchical
organization. We find relevant link communities in many networks, including
major biological networks such as protein-protein interaction and metabolic
networks, and show that a large social network contains hierarchically
organized community structures spanning inner-city to regional scales while
maintaining pervasive overlap. Our results imply that link communities are
fundamental building blocks that reveal overlap and hierarchical organization
in networks to be two aspects of the same phenomenon.Comment: Main text and supplementary informatio
Linking Datasets on Organizations Using Half A Billion Open Collaborated Records
Scholars studying organizations often work with multiple datasets lacking
shared unique identifiers or covariates. In such situations, researchers may
turn to approximate string matching methods to combine datasets. String
matching, although useful, faces fundamental challenges. Even when two strings
appear similar to humans, fuzzy matching often does not work because it fails
to adapt to the informativeness of the character combinations presented. Worse,
many entities have multiple names that are dissimilar (e.g., "Fannie Mae" and
"Federal National Mortgage Association"), a case where string matching has
little hope of succeeding. This paper introduces data from a prominent
employment-related networking site (LinkedIn) as a tool to address these
problems. We propose interconnected approaches to leveraging the massive amount
of information from LinkedIn regarding organizational name-to-name links. The
first approach builds a machine learning model for predicting matches from
character strings, treating the trillions of user-contributed organizational
name pairs as a training corpus: this approach constructs a string matching
metric that explicitly maximizes match probabilities. A second approach
identifies relationships between organization names using network
representations of the LinkedIn data. A third approach combines the first and
second. We document substantial improvements over fuzzy matching in
applications, making all methods accessible in open-source software
("LinkOrgs")
Network and attribute‐based clustering of tennis players and tournaments
This paper aims at targeting some relevant issues for clustering tennis players and
tournaments: (i) it considers players, tournaments and the relation between them;
(ii) the relation is taken into account in the fuzzy clustering model based on the
Partitioning Around Medoids (PAM) algorithm through spatial constraints; (iii) the
attributes of the players and of the tournaments are of different nature, qualitative
and quantitative. The proposal is novel for the methodology used, a spatial Fuzzy
clustering model for players and for tournaments (based on related attributes), where
the spatial penalty term in each clustering model depends on the relation between
players and tournaments described in the adjacency matrix. The proposed model is
compared with a bipartite players-tournament complex network model (the Degree-
Corrected Stochastic Blockmodel) that considers only the relation between players
and tournaments, described in the adjacency matrix, to obtain communities on each
side of the bipartite network. An application on data taken from the ATP official
website with regards to the draws of the tournaments, and from the sport statistics
website Wheelo ratings for the performance data of players and tournaments, shows
the performances of the proposed clustering model
Drawing impossible boundaries: field delineation of Social Network Science
"Big" digital behavioral data increasingly allows large-scale and high-resolution analyses of the behavior and performance of persons or aggregated identities in whole fields. Often the desired system of study is only a subset of a larger database. The task of drawing a field boundary is complicated because socio-cultural systems are highly overlapping. Here, I propose a sociologically enhanced information retrieval method to delineate fields that is based on the reproductive mechanism of fields, able to account for field heterogeneity, and generally applicable also outside scientometric, e.g., in social media, contexts. The method is demonstrated in a delineation of the multidisciplinary and very heterogeneous Social Network Science field using the Web of Science database. The field consists of 25,760 publications and has a historical dimension (1916-2012). This set has high face validity and exhibits expected statistical properties like systemic growth and power law size distributions. Data is clean and disambiguated. The dataset with 45,580 author names and 23,026 linguistic concepts is publically available and supposed to enable high-quality analyses of an evolving complex socio-cultural system
Semisupervised Community Detection by Voltage Drops
Many applications show that semisupervised community detection is one of the important topics and has attracted considerable attention in the study of complex network. In this paper, based on notion of voltage drops and discrete potential theory, a simple and fast semisupervised community detection algorithm is proposed. The label propagation through discrete potential transmission is accomplished by using voltage drops. The complexity of the proposal is OV+E for the sparse network with V vertices and E edges. The obtained voltage value of a vertex can be reflected clearly in the relationship between the vertex and community. The experimental results on four real networks and three benchmarks indicate that the proposed algorithm is effective and flexible. Furthermore, this algorithm is easily applied to graph-based machine learning methods
Characterization of complex networks: A survey of measurements
Each complex network (or class of networks) presents specific topological
features which characterize its connectivity and highly influence the dynamics
of processes executed on the network. The analysis, discrimination, and
synthesis of complex networks therefore rely on the use of measurements capable
of expressing the most relevant topological features. This article presents a
survey of such measurements. It includes general considerations about complex
network characterization, a brief review of the principal models, and the
presentation of the main existing measurements. Important related issues
covered in this work comprise the representation of the evolution of complex
networks in terms of trajectories in several measurement spaces, the analysis
of the correlations between some of the most traditional measurements,
perturbation analysis, as well as the use of multivariate statistics for
feature selection and network classification. Depending on the network and the
analysis task one has in mind, a specific set of features may be chosen. It is
hoped that the present survey will help the proper application and
interpretation of measurements.Comment: A working manuscript with 78 pages, 32 figures. Suggestions of
measurements for inclusion are welcomed by the author
- …