55,184 research outputs found

    The streaming kk-mismatch problem

    Get PDF
    We consider the streaming complexity of a fundamental task in approximate pattern matching: the kk-mismatch problem. It asks to compute Hamming distances between a pattern of length nn and all length-nn substrings of a text for which the Hamming distance does not exceed a given threshold kk. In our problem formulation, we report not only the Hamming distance but also, on demand, the full \emph{mismatch information}, that is the list of mismatched pairs of symbols and their indices. The twin challenges of streaming pattern matching derive from the need both to achieve small working space and also to guarantee that every arriving input symbol is processed quickly. We present a streaming algorithm for the kk-mismatch problem which uses O(klognlognk)O(k\log{n}\log\frac{n}{k}) bits of space and spends \ourcomplexity time on each symbol of the input stream, which consists of the pattern followed by the text. The running time almost matches the classic offline solution and the space usage is within a logarithmic factor of optimal. Our new algorithm therefore effectively resolves and also extends an open problem first posed in FOCS'09. En route to this solution, we also give a deterministic O(k(lognk+logΣ))O( k (\log \frac{n}{k} + \log |\Sigma|) )-bit encoding of all the alignments with Hamming distance at most kk of a length-nn pattern within a text of length O(n)O(n). This secondary result provides an optimal solution to a natural communication complexity problem which may be of independent interest.Comment: 27 page

    Classification and Galois conjugacy of Hamming maps

    Full text link
    We show that for each d>0 the d-dimensional Hamming graph H(d,q) has an orientably regular surface embedding if and only if q is a prime power p^e. If q>2 there are up to isomorphism \phi(q-1)/e such maps, all constructed as Cayley maps for a d-dimensional vector space over the field of order q. We show that for each such pair d, q the corresponding Belyi pairs are conjugate under the action of the absolute Galois group, and we determine their minimal field of definition. We also classify the orientably regular embedding of merged Hamming graphs for q>3

    Clustering in Hilbert space of a quantum optimization problem

    Full text link
    The solution space of many classical optimization problems breaks up into clusters which are extensively distant from one another in the Hamming metric. Here, we show that an analogous quantum clustering phenomenon takes place in the ground state subspace of a certain quantum optimization problem. This involves extending the notion of clustering to Hilbert space, where the classical Hamming distance is not immediately useful. Quantum clusters correspond to macroscopically distinct subspaces of the full quantum ground state space which grow with the system size. We explicitly demonstrate that such clusters arise in the solution space of random quantum satisfiability (3-QSAT) at its satisfiability transition. We estimate both the number of these clusters and their internal entropy. The former are given by the number of hardcore dimer coverings of the core of the interaction graph, while the latter is related to the underconstrained degrees of freedom not touched by the dimers. We additionally provide new numerical evidence suggesting that the 3-QSAT satisfiability transition may coincide with the product satisfiability transition, which would imply the absence of an intermediate entangled satisfiable phase.Comment: 11 pages, 6 figure

    Optimal Las Vegas Locality Sensitive Data Structures

    Full text link
    We show that approximate similarity (near neighbour) search can be solved in high dimensions with performance matching state of the art (data independent) Locality Sensitive Hashing, but with a guarantee of no false negatives. Specifically, we give two data structures for common problems. For cc-approximate near neighbour in Hamming space we get query time dn1/c+o(1)dn^{1/c+o(1)} and space dn1+1/c+o(1)dn^{1+1/c+o(1)} matching that of \cite{indyk1998approximate} and answering a long standing open question from~\cite{indyk2000dimensionality} and~\cite{pagh2016locality} in the affirmative. By means of a new deterministic reduction from 1\ell_1 to Hamming we also solve 1\ell_1 and 2\ell_2 with query time d2n1/c+o(1)d^2n^{1/c+o(1)} and space d2n1+1/c+o(1)d^2 n^{1+1/c+o(1)}. For (s1,s2)(s_1,s_2)-approximate Jaccard similarity we get query time dnρ+o(1)dn^{\rho+o(1)} and space dn1+ρ+o(1)dn^{1+\rho+o(1)}, ρ=log1+s12s1/log1+s22s2\rho=\log\frac{1+s_1}{2s_1}\big/\log\frac{1+s_2}{2s_2}, when sets have equal size, matching the performance of~\cite{tobias2016}. The algorithms are based on space partitions, as with classic LSH, but we construct these using a combination of brute force, tensoring, perfect hashing and splitter functions \`a la~\cite{naor1995splitters}. We also show a new dimensionality reduction lemma with 1-sided error

    Transfer Adversarial Hashing for Hamming Space Retrieval

    Full text link
    Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise tt-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets
    corecore