1,255 research outputs found

    Mixture models and exploratory analysis in networks

    Get PDF
    Networks are widely used in the biological, physical, and social sciences as a concise mathematical representation of the topology of systems of interacting components. Understanding the structure of these networks is one of the outstanding challenges in the study of complex systems. Here we describe a general technique for detecting structural features in large-scale network data which works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes. Using the machinery of probabilistic mixture models and the expectation-maximization algorithm, we show that it is possible to detect, without prior knowledge of what we are looking for, a very broad range of types of structure in networks. We give a number of examples demonstrating how the method can be used to shed light on the properties of real-world networks, including social and information networks.Comment: 8 pages, 4 figures, two new examples in this version plus minor correction

    Semi-Supervised Overlapping Community Finding based on Label Propagation with Pairwise Constraints

    Get PDF
    Algorithms for detecting communities in complex networks are generally unsupervised, relying solely on the structure of the network. However, these methods can often fail to uncover meaningful groupings that reflect the underlying communities in the data, particularly when those structures are highly overlapping. One way to improve the usefulness of these algorithms is by incorporating additional background information, which can be used as a source of constraints to direct the community detection process. In this work, we explore the potential of semi-supervised strategies to improve algorithms for finding overlapping communities in networks. Specifically, we propose a new method, based on label propagation, for finding communities using a limited number of pairwise constraints. Evaluations on synthetic and real-world datasets demonstrate the potential of this approach for uncovering meaningful community structures in cases where each node can potentially belong to more than one community.Comment: Fix table

    Assessing the association between oral hygiene and preterm birth by quantitative light-induced fluorescence

    Get PDF
    The aim of this study was to investigate the purported link between oral hygiene and preterm birth by using image analysis tools to quantify dental plaque biofilm. Volunteers (η = 91) attending an antenatal clinic were identified as those considered to be “at high risk” of preterm delivery (i.e., a previous history of idiopathic preterm delivery, case group) or those who were not considered to be at risk (control group). The women had images of their anterior teeth captured using quantitative light-induced fluorescence (QLF). These images were analysed to calculate the amount of red fluorescent plaque (ΔR%) and percentage of plaque coverage. QLF showed little difference in ΔR% between the two groups, 65.00% case versus 68.70% control, whereas there was 19.29% difference with regard to the mean plaque coverage, 25.50% case versus 20.58% control. A logistic regression model showed a significant association between plaque coverage and case/control status (Ρ = 0.031), controlling for other potential predictor variables, namely, smoking status, maternal age, and body mass index (BMI)

    Community Structure in Time-Dependent, Multiscale, and Multiplex Networks

    Full text link
    Network science is an interdisciplinary endeavor, with methods and applications drawn from across the natural, social, and information sciences. A prominent problem in network science is the algorithmic detection of tightly-connected groups of nodes known as communities. We developed a generalized framework of network quality functions that allowed us to study the community structure of arbitrary multislice networks, which are combinations of individual networks coupled through links that connect each node in one network slice to itself in other slices. This framework allows one to study community structure in a very general setting encompassing networks that evolve over time, have multiple types of links (multiplexity), and have multiple scales.Comment: 31 pages, 3 figures, 1 table. Includes main text and supporting material. This is the accepted version of the manuscript (the definitive version appeared in Science), with typographical corrections included her

    Distributed Community Detection in Dynamic Graphs

    Full text link
    Inspired by the increasing interest in self-organizing social opportunistic networks, we investigate the problem of distributed detection of unknown communities in dynamic random graphs. As a formal framework, we consider the dynamic version of the well-studied \emph{Planted Bisection Model} \sdG(n,p,q) where the node set [n][n] of the network is partitioned into two unknown communities and, at every time step, each possible edge (u,v)(u,v) is active with probability pp if both nodes belong to the same community, while it is active with probability qq (with q<<pq<<p) otherwise. We also consider a time-Markovian generalization of this model. We propose a distributed protocol based on the popular \emph{Label Propagation Algorithm} and prove that, when the ratio p/qp/q is larger than nbn^{b} (for an arbitrarily small constant b>0b>0), the protocol finds the right "planted" partition in O(logn)O(\log n) time even when the snapshots of the dynamic graph are sparse and disconnected (i.e. in the case p=Θ(1/n)p=\Theta(1/n)).Comment: Version I

    Distance, dissimilarity index, and network community structure

    Full text link
    We address the question of finding the community structure of a complex network. In an earlier effort [H. Zhou, {\em Phys. Rev. E} (2003)], the concept of network random walking is introduced and a distance measure defined. Here we calculate, based on this distance measure, the dissimilarity index between nearest-neighboring vertices of a network and design an algorithm to partition these vertices into communities that are hierarchically organized. Each community is characterized by an upper and a lower dissimilarity threshold. The algorithm is applied to several artificial and real-world networks, and excellent results are obtained. In the case of artificially generated random modular networks, this method outperforms the algorithm based on the concept of edge betweenness centrality. For yeast's protein-protein interaction network, we are able to identify many clusters that have well defined biological functions.Comment: 10 pages, 7 figures, REVTeX4 forma

    Exploiting Resolution-based Representations for MaxSAT Solving

    Full text link
    Most recent MaxSAT algorithms rely on a succession of calls to a SAT solver in order to find an optimal solution. In particular, several algorithms take advantage of the ability of SAT solvers to identify unsatisfiable subformulas. Usually, these MaxSAT algorithms perform better when small unsatisfiable subformulas are found early. However, this is not the case in many problem instances, since the whole formula is given to the SAT solver in each call. In this paper, we propose to partition the MaxSAT formula using a resolution-based graph representation. Partitions are then iteratively joined by using a proximity measure extracted from the graph representation of the formula. The algorithm ends when only one partition remains and the optimal solution is found. Experimental results show that this new approach further enhances a state of the art MaxSAT solver to optimally solve a larger set of industrial problem instances

    Managing clustering effects and learning effects in the design and analysis of multicentre randomised trials: a survey to establish current practice.

    Get PDF
    BACKGROUND:Patient outcomes can depend on the treating centre, or health professional, delivering the intervention. A health professional's skill in delivery improves with experience, meaning that outcomes may be associated with learning. Considering differences in intervention delivery at trial design will ensure that any appropriate adjustments can be made during analysis. This work aimed to establish practice for the allowance of clustering and learning effects in the design and analysis of randomised multicentre trials. METHODS:A survey that drew upon quotes from existing guidelines, references to relevant publications and example trial scenarios was delivered. Registered UK Clinical Research Collaboration Registered Clinical Trials Units were invited to participate. RESULTS:Forty-four Units participated (N = 50). Clustering was managed through design by stratification, more commonly by centre than by treatment provider. Managing learning by design through defining a minimum expertise level for treatment provider was common (89%). One-third reported experience in expertise-based designs. The majority of Units had adjusted for clustering during analysis, although approaches varied. Analysis of learning was rarely performed for the main analysis (n = 1), although it was explored by other means. The insight behind the approaches used within and reasons for, or against, alternative approaches were provided. CONCLUSIONS:Widespread awareness of challenges in designing and analysing multicentre trials is identified. Approaches used, and opinions on these, vary both across and within Units, indicating that approaches are dependent on the type of trial. Agreeing principles to guide trial design and analysis across a range of realistic clinical scenarios should be considered
    corecore