Search CORE

1,512 research outputs found

Exact Single-Source SimRank Computation on Large Graphs

Author: Du Xiaoyong
Wang Hanzhi
Wei Zhewei
Wen Ji-Rong
Yuan Ye
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2020
Field of study

SimRank is a popular measurement for evaluating the node-to-node similarities based on the graph topology. In recent years, single-source and top-

k

SimRank queries have received increasing attention due to their applications in web mining, social network analysis, and spam detection. However, a fundamental obstacle in studying SimRank has been the lack of ground truths. The only exact algorithm, Power Method, is computationally infeasible on graphs with more than

10^6

nodes. Consequently, no existing work has evaluated the actual trade-offs between query time and accuracy on large real-world graphs. In this paper, we present ExactSim, the first algorithm that computes the exact single-source and top-

k

SimRank results on large graphs. With high probability, this algorithm produces ground truths with a rigorous theoretical guarantee. We conduct extensive experiments on real-world datasets to demonstrate the efficiency of ExactSim. The results show that ExactSim provides the ground truth for any single-source SimRank query with a precision up to 7 decimal places within a reasonable query time.Comment: ACM SIGMOD 202

arXiv.org e-Print Archive

Crossref

Query Complexity of Approximate Nash Equilibria

Author: Babichenko Yakov
Publication venue
Publication date: 01/06/2014
Field of study

We study the query complexity of approximate notions of Nash equilibrium in games with a large number of players

n

. Our main result states that for

n

-player binary-action games and for constant

\varepsilon

, the query complexity of an

\varepsilon

-well-supported Nash equilibrium is exponential in

n

. One of the consequences of this result is an exponential lower bound on the rate of convergence of adaptive dynamics to approxiamte Nash equilibrium

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

Approximating the Held-Karp Bound for Metric TSP in Nearly Linear Time

Author: Chekuri Chandra
Quanrud Kent
Publication venue
Publication date: 13/10/2017
Field of study

We give a nearly linear time randomized approximation scheme for the Held-Karp bound [Held and Karp, 1970] for metric TSP. Formally, given an undirected edge-weighted graph

G

m

edges and

\epsilon > 0

, the algorithm outputs in

O(m \log^4n /\epsilon^2)

time, with high probability, a

(1+\epsilon)

-approximation to the Held-Karp bound on the metric TSP instance induced by the shortest path metric on

G

. The algorithm can also be used to output a corresponding solution to the Subtour Elimination LP. We substantially improve upon the

O(m^2 \log^2(m)/\epsilon^2)

running time achieved previously by Garg and Khandekar. The LP solution can be used to obtain a fast randomized

\big(\frac{3}{2} + \epsilon\big)

-approximation for metric TSP which improves upon the running time of previous implementations of Christofides' algorithm

arXiv.org e-Print Archive

Crossref

A Relational Gradient Descent Algorithm For Support Vector Machine Training

Author: Abo-Khamis Mahmoud
Im Sungjin
Moseley Benjamin
Pruhs Kirk
Samadian Alireza
Publication venue
Publication date: 11/05/2020
Field of study

We consider gradient descent like algorithms for Support Vector Machine (SVM) training when the data is in relational form. The gradient of the SVM objective can not be efficiently computed by known techniques as it suffers from the ``subtraction problem''. We first show that the subtraction problem can not be surmounted by showing that computing any constant approximation of the gradient of the SVM objective function is

\#P

-hard, even for acyclic joins. We, however, circumvent the subtraction problem by restricting our attention to stable instances, which intuitively are instances where a nearly optimal solution remains nearly optimal if the points are perturbed slightly. We give an efficient algorithm that computes a ``pseudo-gradient'' that guarantees convergence for stable instances at a rate comparable to that achieved by using the actual gradient. We believe that our results suggest that this sort of stability the analysis would likely yield useful insight in the context of designing algorithms on relational data for other learning problems in which the subtraction problem arises

arXiv.org e-Print Archive

Crossref

Rapid Sampling for Visualizations with Ordering Guarantees

Author: Blais Eric
Indyk Piotr
Kim Albert
Madden Sam
Parameswaran Aditya
Rubinfeld Ronitt
Publication venue
Publication date: 09/12/2014
Field of study

Visualizations are frequently used as a means to understand trends and gather insights from datasets, but often take a long time to generate. In this paper, we focus on the problem of rapidly generating approximate visualizations while preserving crucial visual proper- ties of interest to analysts. Our primary focus will be on sampling algorithms that preserve the visual property of ordering; our techniques will also apply to some other visual properties. For instance, our algorithms can be used to generate an approximate visualization of a bar chart very rapidly, where the comparisons between any two bars are correct. We formally show that our sampling algorithms are generally applicable and provably optimal in theory, in that they do not take more samples than necessary to generate the visualizations with ordering guarantees. They also work well in practice, correctly ordering output groups while taking orders of magnitude fewer samples and much less time than conventional sampling schemes.Comment: Tech Report. 17 pages. Condensed version to appear in VLDB Vol. 8 No.

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

PubMed Central

eScholarship - University of California

Coresets for Relational Data and The Applications

Author: Chen Jiaxiang
Ding Hu
Huang Ruomin
Yang Qingyuan
Publication venue
Publication date: 09/10/2022
Field of study

A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques assume that the input data set is available to process explicitly. However, this assumption may not hold in real-world scenarios. In this paper, we consider the problem of coresets construction over relational data. Namely, the data is decoupled into several relational tables, and it could be very expensive to directly materialize the data matrix by joining the tables. We propose a novel approach called ``aggregation tree with pseudo-cube'' that can build a coreset from bottom to up. Moreover, our approach can neatly circumvent several troublesome issues of relational learning problems [Khamis et al., PODS 2019]. Under some mild assumptions, we show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM

arXiv.org e-Print Archive