311 research outputs found

    Cluster Analysis Based on Bipartite Network

    Get PDF
    Clustering data has a wide range of applications and has attracted considerable attention in data mining and artificial intelligence. However it is difficult to find a set of clusters that best fits natural partitions without any class information. In this paper, a method for detecting the optimal cluster number is proposed. The optimal cluster number can be obtained by the proposal, while partitioning the data into clusters by FCM (Fuzzy c-means) algorithm. It overcomes the drawback of FCM algorithm which needs to define the cluster number c in advance. The method works by converting the fuzzy cluster result into a weighted bipartite network and then the optimal cluster number can be detected by the improved bipartite modularity. The experimental results on artificial and real data sets show the validity of the proposed method

    Link communities reveal multiscale complexity in networks

    Full text link
    Networks have become a key approach to understanding systems of interacting objects, unifying the study of diverse phenomena including biological organisms and human society. One crucial step when studying the structure and dynamics of networks is to identify communities: groups of related nodes that correspond to functional subunits such as protein complexes or social spheres. Communities in networks often overlap such that nodes simultaneously belong to several groups. Meanwhile, many networks are known to possess hierarchical organization, where communities are recursively grouped into a hierarchical structure. However, the fact that many real networks have communities with pervasive overlap, where each and every node belongs to more than one group, has the consequence that a global hierarchy of nodes cannot capture the relationships between overlapping groups. Here we reinvent communities as groups of links rather than nodes and show that this unorthodox approach successfully reconciles the antagonistic organizing principles of overlapping communities and hierarchy. In contrast to the existing literature, which has entirely focused on grouping nodes, link communities naturally incorporate overlap while revealing hierarchical organization. We find relevant link communities in many networks, including major biological networks such as protein-protein interaction and metabolic networks, and show that a large social network contains hierarchically organized community structures spanning inner-city to regional scales while maintaining pervasive overlap. Our results imply that link communities are fundamental building blocks that reveal overlap and hierarchical organization in networks to be two aspects of the same phenomenon.Comment: Main text and supplementary informatio

    Linking Datasets on Organizations Using Half A Billion Open Collaborated Records

    Full text link
    Scholars studying organizations often work with multiple datasets lacking shared unique identifiers or covariates. In such situations, researchers may turn to approximate string matching methods to combine datasets. String matching, although useful, faces fundamental challenges. Even when two strings appear similar to humans, fuzzy matching often does not work because it fails to adapt to the informativeness of the character combinations presented. Worse, many entities have multiple names that are dissimilar (e.g., "Fannie Mae" and "Federal National Mortgage Association"), a case where string matching has little hope of succeeding. This paper introduces data from a prominent employment-related networking site (LinkedIn) as a tool to address these problems. We propose interconnected approaches to leveraging the massive amount of information from LinkedIn regarding organizational name-to-name links. The first approach builds a machine learning model for predicting matches from character strings, treating the trillions of user-contributed organizational name pairs as a training corpus: this approach constructs a string matching metric that explicitly maximizes match probabilities. A second approach identifies relationships between organization names using network representations of the LinkedIn data. A third approach combines the first and second. We document substantial improvements over fuzzy matching in applications, making all methods accessible in open-source software ("LinkOrgs")

    Network and attribute‐based clustering of tennis players and tournaments

    Get PDF
    This paper aims at targeting some relevant issues for clustering tennis players and tournaments: (i) it considers players, tournaments and the relation between them; (ii) the relation is taken into account in the fuzzy clustering model based on the Partitioning Around Medoids (PAM) algorithm through spatial constraints; (iii) the attributes of the players and of the tournaments are of different nature, qualitative and quantitative. The proposal is novel for the methodology used, a spatial Fuzzy clustering model for players and for tournaments (based on related attributes), where the spatial penalty term in each clustering model depends on the relation between players and tournaments described in the adjacency matrix. The proposed model is compared with a bipartite players-tournament complex network model (the Degree- Corrected Stochastic Blockmodel) that considers only the relation between players and tournaments, described in the adjacency matrix, to obtain communities on each side of the bipartite network. An application on data taken from the ATP official website with regards to the draws of the tournaments, and from the sport statistics website Wheelo ratings for the performance data of players and tournaments, shows the performances of the proposed clustering model

    Drawing impossible boundaries: field delineation of Social Network Science

    Get PDF
    "Big" digital behavioral data increasingly allows large-scale and high-resolution analyses of the behavior and performance of persons or aggregated identities in whole fields. Often the desired system of study is only a subset of a larger database. The task of drawing a field boundary is complicated because socio-cultural systems are highly overlapping. Here, I propose a sociologically enhanced information retrieval method to delineate fields that is based on the reproductive mechanism of fields, able to account for field heterogeneity, and generally applicable also outside scientometric, e.g., in social media, contexts. The method is demonstrated in a delineation of the multidisciplinary and very heterogeneous Social Network Science field using the Web of Science database. The field consists of 25,760 publications and has a historical dimension (1916-2012). This set has high face validity and exhibits expected statistical properties like systemic growth and power law size distributions. Data is clean and disambiguated. The dataset with 45,580 author names and 23,026 linguistic concepts is publically available and supposed to enable high-quality analyses of an evolving complex socio-cultural system

    Semisupervised Community Detection by Voltage Drops

    Get PDF
    Many applications show that semisupervised community detection is one of the important topics and has attracted considerable attention in the study of complex network. In this paper, based on notion of voltage drops and discrete potential theory, a simple and fast semisupervised community detection algorithm is proposed. The label propagation through discrete potential transmission is accomplished by using voltage drops. The complexity of the proposal is OV+E for the sparse network with V vertices and E edges. The obtained voltage value of a vertex can be reflected clearly in the relationship between the vertex and community. The experimental results on four real networks and three benchmarks indicate that the proposed algorithm is effective and flexible. Furthermore, this algorithm is easily applied to graph-based machine learning methods

    Characterization of complex networks: A survey of measurements

    Full text link
    Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements. Important related issues covered in this work comprise the representation of the evolution of complex networks in terms of trajectories in several measurement spaces, the analysis of the correlations between some of the most traditional measurements, perturbation analysis, as well as the use of multivariate statistics for feature selection and network classification. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the proper application and interpretation of measurements.Comment: A working manuscript with 78 pages, 32 figures. Suggestions of measurements for inclusion are welcomed by the author
    corecore