4,882 research outputs found

    Considerations about multistep community detection

    Full text link
    The problem and implications of community detection in networks have raised a huge attention, for its important applications in both natural and social sciences. A number of algorithms has been developed to solve this problem, addressing either speed optimization or the quality of the partitions calculated. In this paper we propose a multi-step procedure bridging the fastest, but less accurate algorithms (coarse clustering), with the slowest, most effective ones (refinement). By adopting heuristic ranking of the nodes, and classifying a fraction of them as `critical', a refinement step can be restricted to this subset of the network, thus saving computational time. Preliminary numerical results are discussed, showing improvement of the final partition.Comment: 12 page

    An automatic method to generate domain-specific investigator networks using PubMed abstracts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Collaboration among investigators has become critical to scientific research. This includes ad hoc collaboration established through personal contacts as well as formal consortia established by funding agencies. Continued growth in online resources for scientific research and communication has promoted the development of highly networked research communities. Extending these networks globally requires identifying additional investigators in a given domain, profiling their research interests, and collecting current contact information. We present a novel strategy for building investigator networks dynamically and producing detailed investigator profiles using data available in PubMed abstracts.</p> <p>Results</p> <p>We developed a novel strategy to obtain detailed investigator information by automatically parsing the affiliation string in PubMed records. We illustrated the results by using a published literature database in human genome epidemiology (HuGE Pub Lit) as a test case. Our parsing strategy extracted country information from 92.1% of the affiliation strings in a random sample of PubMed records and in 97.0% of HuGE records, with accuracies of 94.0% and 91.0%, respectively. Institution information was parsed from 91.3% of the general PubMed records (accuracy 86.8%) and from 94.2% of HuGE PubMed records (accuracy 87.0). We demonstrated the application of our approach to dynamic creation of investigator networks by creating a prototype information system containing a large database of PubMed abstracts relevant to human genome epidemiology (HuGE Pub Lit), indexed using PubMed medical subject headings converted to Unified Medical Language System concepts. Our method was able to identify 70–90% of the investigators/collaborators in three different human genetics fields; it also successfully identified 9 of 10 genetics investigators within the PREBIC network, an existing preterm birth research network.</p> <p>Conclusion</p> <p>We successfully created a web-based prototype capable of creating domain-specific investigator networks based on an application that accurately generates detailed investigator profiles from PubMed abstracts combined with robust standard vocabularies. This approach could be used for other biomedical fields to efficiently establish domain-specific investigator networks.</p

    Outlier Edge Detection Using Random Graph Generation Models and Applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

    Network 'small-world-ness': a quantitative method for determining canonical network equivalence

    Get PDF
    Background: Many technological, biological, social, and information networks fall into the broad class of 'small-world' networks: they have tightly interconnected clusters of nodes, and a shortest mean path length that is similar to a matched random graph (same number of nodes and edges). This semi-quantitative definition leads to a categorical distinction ('small/not-small') rather than a quantitative, continuous grading of networks, and can lead to uncertainty about a network's small-world status. Moreover, systems described by small-world networks are often studied using an equivalent canonical network model-the Watts-Strogatz (WS) model. However, the process of establishing an equivalent WS model is imprecise and there is a pressing need to discover ways in which this equivalence may be quantified. Methodology/Principal Findings: We defined a precise measure of 'small-world-ness' S based on the trade off between high local clustering and short path length. A network is now deemed a 'small-world' if S. 1-an assertion which may be tested statistically. We then examined the behavior of S on a large data-set of real-world systems. We found that all these systems were linked by a linear relationship between their S values and the network size n. Moreover, we show a method for assigning a unique Watts-Strogatz (WS) model to any real-world network, and show analytically that the WS models associated with our sample of networks also show linearity between S and n. Linearity between S and n is not, however, inevitable, and neither is S maximal for an arbitrary network of given size. Linearity may, however, be explained by a common limiting growth process. Conclusions/Significance: We have shown how the notion of a small-world network may be quantified. Several key properties of the metric are described and the use of WS canonical models is placed on a more secure footing

    Seeds Buffering for Information Spreading Processes

    Full text link
    Seeding strategies for influence maximization in social networks have been studied for more than a decade. They have mainly relied on the activation of all resources (seeds) simultaneously in the beginning; yet, it has been shown that sequential seeding strategies are commonly better. This research focuses on studying sequential seeding with buffering, which is an extension to basic sequential seeding concept. The proposed method avoids choosing nodes that will be activated through the natural diffusion process, which is leading to better use of the budget for activating seed nodes in the social influence process. This approach was compared with sequential seeding without buffering and single stage seeding. The results on both real and artificial social networks confirm that the buffer-based consecutive seeding is a good trade-off between the final coverage and the time to reach it. It performs significantly better than its rivals for a fixed budget. The gain is obtained by dynamic rankings and the ability to detect network areas with nodes that are not yet activated and have high potential of activating their neighbours.Comment: Jankowski, J., Br\'odka, P., Michalski, R., & Kazienko, P. (2017, September). Seeds Buffering for Information Spreading Processes. In International Conference on Social Informatics (pp. 628-641). Springe

    The Routing of Complex Contagion in Kleinberg's Small-World Networks

    Full text link
    In Kleinberg's small-world network model, strong ties are modeled as deterministic edges in the underlying base grid and weak ties are modeled as random edges connecting remote nodes. The probability of connecting a node uu with node vv through a weak tie is proportional to 1/uvα1/|uv|^\alpha, where uv|uv| is the grid distance between uu and vv and α0\alpha\ge 0 is the parameter of the model. Complex contagion refers to the propagation mechanism in a network where each node is activated only after k2k \ge 2 neighbors of the node are activated. In this paper, we propose the concept of routing of complex contagion (or complex routing), where we can activate one node at one time step with the goal of activating the targeted node in the end. We consider decentralized routing scheme where only the weak ties from the activated nodes are revealed. We study the routing time of complex contagion and compare the result with simple routing and complex diffusion (the diffusion of complex contagion, where all nodes that could be activated are activated immediately in the same step with the goal of activating all nodes in the end). We show that for decentralized complex routing, the routing time is lower bounded by a polynomial in nn (the number of nodes in the network) for all range of α\alpha both in expectation and with high probability (in particular, Ω(n1α+2)\Omega(n^{\frac{1}{\alpha+2}}) for α2\alpha \le 2 and Ω(nα2(α+2))\Omega(n^{\frac{\alpha}{2(\alpha+2)}}) for α>2\alpha > 2 in expectation), while the routing time of simple contagion has polylogarithmic upper bound when α=2\alpha = 2. Our results indicate that complex routing is harder than complex diffusion and the routing time of complex contagion differs exponentially compared to simple contagion at sweetspot.Comment: Conference version will appear in COCOON 201

    Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints

    Get PDF
    Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness of these algorithms is by incorporating additional background information, which can be used as a source of constraints to direct the community detection process. In this work, we explore the potential of semi-supervised strategies to improve algorithms for finding overlapping communities in networks. Specifically, we propose a new method, based on label propagation, for finding communities using a limited number of pairwise constraints. Evaluations on synthetic and real-world datasets demonstrate the potential of this approach for uncovering meaningful community structures in cases where each node can potentially belong to more than one community.Comment: Fix table

    Structural Properties of Ego Networks

    Full text link
    The structure of real-world social networks in large part determines the evolution of social phenomena, including opinion formation, diffusion of information and influence, and the spread of disease. Globally, network structure is characterized by features such as degree distribution, degree assortativity, and clustering coefficient. However, information about global structure is usually not available to each vertex. Instead, each vertex's knowledge is generally limited to the locally observable portion of the network consisting of the subgraph over its immediate neighbors. Such subgraphs, known as ego networks, have properties that can differ substantially from those of the global network. In this paper, we study the structural properties of ego networks and show how they relate to the global properties of networks from which they are derived. Through empirical comparisons and mathematical derivations, we show that structural features, similar to static attributes, suffer from paradoxes. We quantify the differences between global information about network structure and local estimates. This knowledge allows us to better identify and correct the biases arising from incomplete local information.Comment: Accepted by SBP 2015, to appear in the proceeding

    The Parameterized Complexity of Centrality Improvement in Networks

    Full text link
    The centrality of a vertex v in a network intuitively captures how important v is for communication in the network. The task of improving the centrality of a vertex has many applications, as a higher centrality often implies a larger impact on the network or less transportation or administration cost. In this work we study the parameterized complexity of the NP-complete problems Closeness Improvement and Betweenness Improvement in which we ask to improve a given vertex' closeness or betweenness centrality by a given amount through adding a given number of edges to the network. Herein, the closeness of a vertex v sums the multiplicative inverses of distances of other vertices to v and the betweenness sums for each pair of vertices the fraction of shortest paths going through v. Unfortunately, for the natural parameter "number of edges to add" we obtain hardness results, even in rather restricted cases. On the positive side, we also give an island of tractability for the parameter measuring the vertex deletion distance to cluster graphs
    corecore