1,512 research outputs found

    Exact Single-Source SimRank Computation on Large Graphs

    Full text link
    SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-kk SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than 10610^6 nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-kk SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

    Query Complexity of Approximate Nash Equilibria

    Full text link
    We study the query complexity of approximate notions of Nash equilibrium in games with a large number of players nn. Our main result states that for nn-player binary-action games and for constant ε\varepsilon, the query complexity of an ε\varepsilon-well-supported Nash equilibrium is exponential in nn. One of the consequences of this result is an exponential lower bound on the rate of convergence of adaptive dynamics to approxiamte Nash equilibrium

    Approximating the Held-Karp Bound for Metric TSP in Nearly Linear Time

    Full text link
    We give a nearly linear time randomized approximation scheme for the Held-Karp bound [Held and Karp, 1970] for metric TSP. Formally, given an undirected edge-weighted graph GG on mm edges and ϵ>0\epsilon > 0, the algorithm outputs in O(mlog4n/ϵ2)O(m \log^4n /\epsilon^2) time, with high probability, a (1+ϵ)(1+\epsilon)-approximation to the Held-Karp bound on the metric TSP instance induced by the shortest path metric on GG. The algorithm can also be used to output a corresponding solution to the Subtour Elimination LP. We substantially improve upon the O(m2log2(m)/ϵ2)O(m^2 \log^2(m)/\epsilon^2) running time achieved previously by Garg and Khandekar. The LP solution can be used to obtain a fast randomized (32+ϵ)\big(\frac{3}{2} + \epsilon\big)-approximation for metric TSP which improves upon the running time of previous implementations of Christofides' algorithm

    A Relational Gradient Descent Algorithm For Support Vector Machine Training

    Full text link
    We consider gradient descent like algorithms for Support Vector Machine (SVM) training when the data is in relational form. The gradient of the SVM objective can not be efficiently computed by known techniques as it suffers from the ``subtraction problem''. We first show that the subtraction problem can not be surmounted by showing that computing any constant approximation of the gradient of the SVM objective function is #P\#P-hard, even for acyclic joins. We, however, circumvent the subtraction problem by restricting our attention to stable instances, which intuitively are instances where a nearly optimal solution remains nearly optimal if the points are perturbed slightly. We give an efficient algorithm that computes a ``pseudo-gradient'' that guarantees convergence for stable instances at a rate comparable to that achieved by using the actual gradient. We believe that our results suggest that this sort of stability the analysis would likely yield useful insight in the context of designing algorithms on relational data for other learning problems in which the subtraction problem arises

    Rapid Sampling for Visualizations with Ordering Guarantees

    Get PDF
    Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our techniques will also apply to some other visual properties. For instance, our algorithms can be used to generate an approximate visualization of a bar chart very rapidly, where the comparisons between any two bars are correct. We formally show that our sampling algorithms are generally applicable and provably optimal in theory, in that they do not take more samples than necessary to generate the visualizations with ordering guarantees. They also work well in practice, correctly ordering output groups while taking orders of magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.

    Coresets for Relational Data and The Applications

    Full text link
    A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques assume that the input data set is available to process explicitly. However, this assumption may not hold in real-world scenarios. In this paper, we consider the problem of coresets construction over relational data. Namely, the data is decoupled into several relational tables, and it could be very expensive to directly materialize the data matrix by joining the tables. We propose a novel approach called ``aggregation tree with pseudo-cube'' that can build a coreset from bottom to up. Moreover, our approach can neatly circumvent several troublesome issues of relational learning problems [Khamis et al., PODS 2019]. Under some mild assumptions, we show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM
    corecore