12,044 research outputs found

    Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

    Full text link
    The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time O(m+m3/2Δlogntϵ2)O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and an ϵ\epsilon approximation (multiplicative error), where nn is the number of vertices, mm the number of edges and Δ\Delta the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage O(m1/2logn+m3/2Δlogntϵ2)O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right) and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

    Triadic Measures on Graphs: The Power of Wedge Sampling

    Full text link
    Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of a graph. Some of the most useful graph metrics, especially those measuring social cohesion, are based on triangles. Despite the importance of these triadic measures, associated algorithms can be extremely expensive. We propose a new method based on wedge sampling. This versatile technique allows for the fast and accurate approximation of all current variants of clustering coefficients and enables rapid uniform sampling of the triangles of a graph. Our methods come with provable and practical time-approximation tradeoffs for all computations. We provide extensive results that show our methods are orders of magnitude faster than the state-of-the-art, while providing nearly the accuracy of full enumeration. Our results will enable more wide-scale adoption of triadic measures for analysis of extremely large graphs, as demonstrated on several real-world examples

    Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs

    Full text link
    Graphs are used to model interactions in a variety of contexts, and there is a growing need to quickly assess the structure of such graphs. Some of the most useful graph metrics are based on triangles, such as those measuring social cohesion. Algorithms to compute them can be extremely expensive, even for moderately-sized graphs with only millions of edges. Previous work has considered node and edge sampling; in contrast, we consider wedge sampling, which provides faster and more accurate approximations than competing techniques. Additionally, wedge sampling enables estimation local clustering coefficients, degree-wise clustering coefficients, uniform triangle sampling, and directed triangle counts. Our methods come with provable and practical probabilistic error estimates for all computations. We provide extensive results that show our methods are both more accurate and faster than state-of-the-art alternatives.Comment: Full version of SDM 2013 paper "Triadic Measures on Graphs: The Power of Wedge Sampling" (arxiv:1202.5230

    Design and Analysis of an Estimation of Distribution Approximation Algorithm for Single Machine Scheduling in Uncertain Environments

    Full text link
    In the current work we introduce a novel estimation of distribution algorithm to tackle a hard combinatorial optimization problem, namely the single-machine scheduling problem, with uncertain delivery times. The majority of the existing research coping with optimization problems in uncertain environment aims at finding a single sufficiently robust solution so that random noise and unpredictable circumstances would have the least possible detrimental effect on the quality of the solution. The measures of robustness are usually based on various kinds of empirically designed averaging techniques. In contrast to the previous work, our algorithm aims at finding a collection of robust schedules that allow for a more informative decision making. The notion of robustness is measured quantitatively in terms of the classical mathematical notion of a norm on a vector space. We provide a theoretical insight into the relationship between the properties of the probability distribution over the uncertain delivery times and the robustness quality of the schedules produced by the algorithm after a polynomial runtime in terms of approximation ratios
    corecore