1,440 research outputs found
Efficient and Effective Query Auto-Completion
Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search
systems, suggesting possible ways of completing the query being typed by the
user. Efficiency is crucial to make the system have a real-time responsiveness
when operating in the million-scale search space. Prior work has extensively
advocated the use of a trie data structure for fast prefix-search operations in
compact space. However, searching by prefix has little discovery power in that
only completions that are prefixed by the query are returned. This may impact
negatively the effectiveness of the QAC system, with a consequent monetary loss
for real applications like Web Search Engines and eCommerce. In this work we
describe the implementation that empowers a new QAC system at eBay, and discuss
its efficiency/effectiveness in relation to other approaches at the
state-of-the-art. The solution is based on the combination of an inverted index
with succinct data structures, a much less explored direction in the
literature. This system is replacing the previous implementation based on
Apache SOLR that was not always able to meet the required
service-level-agreement.Comment: Published in SIGIR 202
Incremental construction of minimal acyclic finite-state automata
In this paper, we describe a new method for constructing minimal,
deterministic, acyclic finite-state automata from a set of strings. Traditional
methods consist of two phases: the first to construct a trie, the second one to
minimize it. Our approach is to construct a minimal automaton in a single phase
by adding new strings one by one and minimizing the resulting automaton
on-the-fly. We present a general algorithm as well as a specialization that
relies upon the lexicographical ordering of the input strings.Comment: 14 pages, 7 figure
The Complexity of Order Type Isomorphism
The order type of a point set in maps each -tuple of points to
its orientation (e.g., clockwise or counterclockwise in ). Two point sets
and have the same order type if there exists a mapping from to
for which every -tuple of and the
corresponding tuple in have the same
orientation. In this paper we investigate the complexity of determining whether
two point sets have the same order type. We provide an algorithm for
this task, thereby improving upon the algorithm
of Goodman and Pollack (1983). The algorithm uses only order type queries and
also works for abstract order types (or acyclic oriented matroids). Our
algorithm is optimal, both in the abstract setting and for realizable points
sets if the algorithm only uses order type queries.Comment: Preliminary version of paper to appear at ACM-SIAM Symposium on
Discrete Algorithms (SODA14
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts
We study the approximate string matching and regular expression matching
problem for the case when the text to be searched is compressed with the
Ziv-Lempel adaptive dictionary compression schemes. We present a time-space
trade-off that leads to algorithms improving the previously known complexities
for both problems. In particular, we significantly improve the space bounds,
which in practical applications are likely to be a bottleneck
- …