2,159 research outputs found

    Fast Similarity Sketching

    Full text link
    We consider the Similarity Sketching problem: Given a universe [u]={0,,u1}[u]= \{0,\ldots,u-1\} we want a random function SS mapping subsets A[u]A\subseteq [u] into vectors S(A)S(A) of size tt, such that similarity is preserved. More precisely: Given sets A,B[u]A,B\subseteq [u], define Xi=[S(A)[i]=S(B)[i]]X_i=[S(A)[i]= S(B)[i]] and X=i[t]XiX=\sum_{i\in [t]}X_i. We want to have E[X]=tJ(A,B)E[X]=t\cdot J(A,B), where J(A,B)=AB/ABJ(A,B)=|A\cap B|/|A\cup B| and furthermore to have strong concentration guarantees (i.e. Chernoff-style bounds) for XX. This is a fundamental problem which has found numerous applications in data mining, large-scale classification, computer vision, similarity search, etc. via the classic MinHash algorithm. The vectors S(A)S(A) are also called sketches. The seminal t×t\timesMinHash algorithm uses tt random hash functions h1,,hth_1,\ldots, h_t, and stores (minaAh1(A),,minaAht(A))\left(\min_{a\in A}h_1(A),\ldots, \min_{a\in A}h_t(A)\right) as the sketch of AA. The main drawback of MinHash is, however, its O(tA)O(t\cdot |A|) running time, and finding a sketch with similar properties and faster running time has been the subject of several papers. Addressing this, Li et al. [NIPS'12] introduced one permutation hashing (OPH), which creates a sketch of size tt in O(t+A)O(t + |A|) time, but with the drawback that possibly some of the tt entries are "empty" when A=O(t)|A| = O(t). One could argue that sketching is not necessary in this case, however the desire in most applications is to have one sketching procedure that works for sets of all sizes. Therefore, filling out these empty entries is the subject of several follow-up papers initiated by Shrivastava and Li [ICML'14]. However, these "densification" schemes fail to provide good concentration bounds exactly in the case A=O(t)|A| = O(t), where they are needed. (continued...

    Dynamic Algorithms for Graph Coloring

    Get PDF
    We design fast dynamic algorithms for proper vertex and edge colorings in a graph undergoing edge insertions and deletions. In the static setting, there are simple linear time algorithms for (Δ+1)(\Delta+1)- vertex coloring and (2Δ1)(2\Delta-1)-edge coloring in a graph with maximum degree Δ\Delta. It is natural to ask if we can efficiently maintain such colorings in the dynamic setting as well. We get the following three results. (1) We present a randomized algorithm which maintains a (Δ+1)(\Delta+1)-vertex coloring with O(logΔ)O(\log \Delta) expected amortized update time. (2) We present a deterministic algorithm which maintains a (1+o(1))Δ(1+o(1))\Delta-vertex coloring with O(polylogΔ)O(\text{poly} \log \Delta) amortized update time. (3) We present a simple, deterministic algorithm which maintains a (2Δ1)(2\Delta-1)-edge coloring with O(logΔ)O(\log \Delta) worst-case update time. This improves the recent O(Δ)O(\Delta)-edge coloring algorithm with O~(Δ)\tilde{O}(\sqrt{\Delta}) worst-case update time by Barenboim and Maimon.Comment: To appear in SODA 201

    A Circuit-Based Approach to Efficient Enumeration

    Get PDF
    We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and delay linear in the Hamming weight of each valuation. Moreover, valuations of constant Hamming weight can be enumerated with linear preprocessing and constant delay. Our results yield a framework for efficient enumeration that applies to all problems whose solutions can be compiled to structured d-DNNFs. In particular, we use it to recapture classical results in database theory, for factorized database representations and for MSO evaluation. This gives an independent proof of constant-delay enumeration for MSO formulae with first-order free variables on bounded-treewidth structures

    Two new results about quantum exact learning

    Get PDF
    We present two new results about exact learning by quantum computers. First, we show how to exactly learn a kk-Fourier-sparse nn-bit Boolean function from O(k1.5(logk)2)O(k^{1.5}(\log k)^2) uniform quantum examples for that function. This improves over the bound of Θ~(kn)\widetilde{\Theta}(kn) uniformly random classical examples (Haviv and Regev, CCC'15). Our main tool is an improvement of Chang's lemma for the special case of sparse functions. Second, we show that if a concept class C\mathcal{C} can be exactly learned using QQ quantum membership queries, then it can also be learned using O(Q2logQlogC)O\left(\frac{Q^2}{\log Q}\log|\mathcal{C}|\right) classical membership queries. This improves the previous-best simulation result (Servedio and Gortler, SICOMP'04) by a logQ\log Q-factor.Comment: v3: 21 pages. Small corrections and clarification
    corecore