1,440 research outputs found

    Efficient and Effective Query Auto-Completion

    Full text link
    Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.Comment: Published in SIGIR 202

    Incremental construction of minimal acyclic finite-state automata

    Get PDF
    In this paper, we describe a new method for constructing minimal, deterministic, acyclic finite-state automata from a set of strings. Traditional methods consist of two phases: the first to construct a trie, the second one to minimize it. Our approach is to construct a minimal automaton in a single phase by adding new strings one by one and minimizing the resulting automaton on-the-fly. We present a general algorithm as well as a specialization that relies upon the lexicographical ordering of the input strings.Comment: 14 pages, 7 figure

    The Complexity of Order Type Isomorphism

    Full text link
    The order type of a point set in RdR^d maps each (d+1)(d{+}1)-tuple of points to its orientation (e.g., clockwise or counterclockwise in R2R^2). Two point sets XX and YY have the same order type if there exists a mapping ff from XX to YY for which every (d+1)(d{+}1)-tuple (a1,a2,…,ad+1)(a_1,a_2,\ldots,a_{d+1}) of XX and the corresponding tuple (f(a1),f(a2),…,f(ad+1))(f(a_1),f(a_2),\ldots,f(a_{d+1})) in YY have the same orientation. In this paper we investigate the complexity of determining whether two point sets have the same order type. We provide an O(nd)O(n^d) algorithm for this task, thereby improving upon the O(n⌊3d/2⌋)O(n^{\lfloor{3d/2}\rfloor}) algorithm of Goodman and Pollack (1983). The algorithm uses only order type queries and also works for abstract order types (or acyclic oriented matroids). Our algorithm is optimal, both in the abstract setting and for realizable points sets if the algorithm only uses order type queries.Comment: Preliminary version of paper to appear at ACM-SIAM Symposium on Discrete Algorithms (SODA14

    Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

    Full text link
    We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck
    • …
    corecore