2,137 research outputs found
Fingerprint databases for theorems
We discuss the advantages of searchable, collaborative, language-independent
databases of mathematical results, indexed by "fingerprints" of small and
canonical data. Our motivating example is Neil Sloane's massively influential
On-Line Encyclopedia of Integer Sequences. We hope to encourage the greater
mathematical community to search for the appropriate fingerprints within each
discipline, and to compile fingerprint databases of results wherever possible.
The benefits of these databases are broad - advancing the state of knowledge,
enhancing experimental mathematics, enabling researchers to discover unexpected
connections between areas, and even improving the refereeing process for
journal publication.Comment: to appear in Notices of the AM
A new problem in string searching
We describe a substring search problem that arises in group presentation
simplification processes. We suggest a two-level searching model: skip and
match levels. We give two timestamp algorithms which skip searching parts of
the text where there are no matches at all and prove their correctness. At the
match level, we consider Harrison signature, Karp-Rabin fingerprint, Bloom
filter and automata based matching algorithms and present experimental
performance figures.Comment: To appear in Proceedings Fifth Annual International Symposium on
Algorithms and Computation (ISAAC'94), Lecture Notes in Computer Scienc
Optimal Substring-Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length from an integer
alphabet of size so that access and substring equality queries (that
is, determining the equality of any two substrings) can be answered
efficiently. Any uniquely-decodable encoding supporting access must take
bits. We describe a new data
structure matching this lower bound when while supporting
both queries in optimal time. Furthermore, we show that the string can
be overwritten in-place with this structure. The redundancy of
bits and the constant query time break exponentially a lower bound that is
known to hold in the read-only model. Using our new string representation, we
obtain the first in-place subquadratic (indeed, even sublinear in some cases)
algorithms for several string-processing problems in the restore model: the
input string is rewritable and must be restored before the computation
terminates. In particular, we describe the first in-place subquadratic Monte
Carlo solutions to the sparse suffix sorting, sparse LCP array construction,
and suffix selection problems. With the sole exception of suffix selection, our
algorithms are also the first running in sublinear time for small enough sets
of input suffixes. Combining these solutions, we obtain the first
sublinear-time Monte Carlo algorithm for building the sparse suffix tree in
compact space. We also show how to derandomize our algorithms using small
space. This leads to the first Las Vegas in-place algorithm computing the full
LCP array in time and to the first Las Vegas in-place algorithms
solving the sparse suffix sorting and sparse LCP array construction problems in
time. Running times of these Las Vegas
algorithms hold in the worst case with high probability.Comment: Refactored according to TALG's reviews. New w.h.p. bounds and Las
Vegas algorithm
Existentially Restricted Quantified Constraint Satisfaction
The quantified constraint satisfaction problem (QCSP) is a powerful framework
for modelling computational problems. The general intractability of the QCSP
has motivated the pursuit of restricted cases that avoid its maximal
complexity. In this paper, we introduce and study a new model for investigating
QCSP complexity in which the types of constraints given by the existentially
quantified variables, is restricted. Our primary technical contribution is the
development and application of a general technology for proving positive
results on parameterizations of the model, of inclusion in the complexity class
coNP
A Tight Lower Bound for Counting Hamiltonian Cycles via Matrix Rank
For even , the matchings connectivity matrix encodes which
pairs of perfect matchings on vertices form a single cycle. Cygan et al.
(STOC 2013) showed that the rank of over is
and used this to give an
time algorithm for counting Hamiltonian cycles modulo on graphs of
pathwidth . The same authors complemented their algorithm by an
essentially tight lower bound under the Strong Exponential Time Hypothesis
(SETH). This bound crucially relied on a large permutation submatrix within
, which enabled a "pattern propagation" commonly used in previous
related lower bounds, as initiated by Lokshtanov et al. (SODA 2011).
We present a new technique for a similar pattern propagation when only a
black-box lower bound on the asymptotic rank of is given; no
stronger structural insights such as the existence of large permutation
submatrices in are needed. Given appropriate rank bounds, our
technique yields lower bounds for counting Hamiltonian cycles (also modulo
fixed primes ) parameterized by pathwidth.
To apply this technique, we prove that the rank of over the
rationals is . We also show that the rank of
over is for any prime
and even for some primes.
As a consequence, we obtain that Hamiltonian cycles cannot be counted in time
for any unless SETH fails. This
bound is tight due to a time algorithm by Bodlaender et
al. (ICALP 2013). Under SETH, we also obtain that Hamiltonian cycles cannot be
counted modulo primes in time , indicating
that the modulus can affect the complexity in intricate ways.Comment: improved lower bounds modulo primes, improved figures, to appear in
SODA 201
Optimal Active Social Network De-anonymization Using Information Thresholds
In this paper, de-anonymizing internet users by actively querying their group
memberships in social networks is considered. In this problem, an anonymous
victim visits the attacker's website, and the attacker uses the victim's
browser history to query her social media activity for the purpose of
de-anonymization using the minimum number of queries. A stochastic model of the
problem is considered where the attacker has partial prior knowledge of the
group membership graph and receives noisy responses to its real-time queries.
The victim's identity is assumed to be chosen randomly based on a given
distribution which models the users' risk of visiting the malicious website. A
de-anonymization algorithm is proposed which operates based on information
thresholds and its performance both in the finite and asymptotically large
social network regimes is analyzed. Furthermore, a converse result is provided
which proves the optimality of the proposed attack strategy
Towards Provably Invisible Network Flow Fingerprints
Network traffic analysis reveals important information even when messages are
encrypted. We consider active traffic analysis via flow fingerprinting by
invisibly embedding information into packet timings of flows. In particular,
assume Alice wishes to embed fingerprints into flows of a set of network input
links, whose packet timings are modeled by Poisson processes, without being
detected by a watchful adversary Willie. Bob, who receives the set of
fingerprinted flows after they pass through the network modeled as a collection
of independent and parallel queues, wishes to extract Alice's embedded
fingerprints to infer the connection between input and output links of the
network. We consider two scenarios: 1) Alice embeds fingerprints in all of the
flows; 2) Alice embeds fingerprints in each flow independently with probability
. Assuming that the flow rates are equal, we calculate the maximum number of
flows in which Alice can invisibly embed fingerprints while having those
fingerprints successfully decoded by Bob. Then, we extend the construction and
analysis to the case where flow rates are distinct, and discuss the extension
of the network model
- …