1,512 research outputs found
Exact Single-Source SimRank Computation on Large Graphs
SimRank is a popular measurement for evaluating the node-to-node similarities
based on the graph topology. In recent years, single-source and top- SimRank
queries have received increasing attention due to their applications in web
mining, social network analysis, and spam detection. However, a fundamental
obstacle in studying SimRank has been the lack of ground truths. The only exact
algorithm, Power Method, is computationally infeasible on graphs with more than
nodes. Consequently, no existing work has evaluated the actual
trade-offs between query time and accuracy on large real-world graphs. In this
paper, we present ExactSim, the first algorithm that computes the exact
single-source and top- SimRank results on large graphs. With high
probability, this algorithm produces ground truths with a rigorous theoretical
guarantee. We conduct extensive experiments on real-world datasets to
demonstrate the efficiency of ExactSim. The results show that ExactSim provides
the ground truth for any single-source SimRank query with a precision up to 7
decimal places within a reasonable query time.Comment: ACM SIGMOD 202
Query Complexity of Approximate Nash Equilibria
We study the query complexity of approximate notions of Nash equilibrium in
games with a large number of players . Our main result states that for
-player binary-action games and for constant , the query
complexity of an -well-supported Nash equilibrium is exponential
in . One of the consequences of this result is an exponential lower bound on
the rate of convergence of adaptive dynamics to approxiamte Nash equilibrium
Approximating the Held-Karp Bound for Metric TSP in Nearly Linear Time
We give a nearly linear time randomized approximation scheme for the
Held-Karp bound [Held and Karp, 1970] for metric TSP. Formally, given an
undirected edge-weighted graph on edges and , the
algorithm outputs in time, with high probability, a
-approximation to the Held-Karp bound on the metric TSP instance
induced by the shortest path metric on . The algorithm can also be used to
output a corresponding solution to the Subtour Elimination LP. We substantially
improve upon the running time achieved previously
by Garg and Khandekar. The LP solution can be used to obtain a fast randomized
-approximation for metric TSP which improves
upon the running time of previous implementations of Christofides' algorithm
A Relational Gradient Descent Algorithm For Support Vector Machine Training
We consider gradient descent like algorithms for Support Vector Machine (SVM)
training when the data is in relational form. The gradient of the SVM objective
can not be efficiently computed by known techniques as it suffers from the
``subtraction problem''. We first show that the subtraction problem can not be
surmounted by showing that computing any constant approximation of the gradient
of the SVM objective function is -hard, even for acyclic joins. We,
however, circumvent the subtraction problem by restricting our attention to
stable instances, which intuitively are instances where a nearly optimal
solution remains nearly optimal if the points are perturbed slightly. We give
an efficient algorithm that computes a ``pseudo-gradient'' that guarantees
convergence for stable instances at a rate comparable to that achieved by using
the actual gradient. We believe that our results suggest that this sort of
stability the analysis would likely yield useful insight in the context of
designing algorithms on relational data for other learning problems in which
the subtraction problem arises
Rapid Sampling for Visualizations with Ordering Guarantees
Visualizations are frequently used as a means to understand trends and gather
insights from datasets, but often take a long time to generate. In this paper,
we focus on the problem of rapidly generating approximate visualizations while
preserving crucial visual proper- ties of interest to analysts. Our primary
focus will be on sampling algorithms that preserve the visual property of
ordering; our techniques will also apply to some other visual properties. For
instance, our algorithms can be used to generate an approximate visualization
of a bar chart very rapidly, where the comparisons between any two bars are
correct. We formally show that our sampling algorithms are generally applicable
and provably optimal in theory, in that they do not take more samples than
necessary to generate the visualizations with ordering guarantees. They also
work well in practice, correctly ordering output groups while taking orders of
magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.
Coresets for Relational Data and The Applications
A coreset is a small set that can approximately preserve the structure of the
original input data set. Therefore we can run our algorithm on a coreset so as
to reduce the total computational complexity. Conventional coreset techniques
assume that the input data set is available to process explicitly. However,
this assumption may not hold in real-world scenarios. In this paper, we
consider the problem of coresets construction over relational data. Namely, the
data is decoupled into several relational tables, and it could be very
expensive to directly materialize the data matrix by joining the tables. We
propose a novel approach called ``aggregation tree with pseudo-cube'' that can
build a coreset from bottom to up. Moreover, our approach can neatly circumvent
several troublesome issues of relational learning problems [Khamis et al., PODS
2019]. Under some mild assumptions, we show that our coreset approach can be
applied for the machine learning tasks, such as clustering, logistic regression
and SVM
- …