21 research outputs found

    Static Score Bucketing in Inverted Indexes

    Full text link
    Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires time-consuming index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results. We also provide upper bounds on the quality degradation and verify experimentally the benefits of the proposed approach

    Context-Sensitive Keyword Search and Ranking for XML

    No full text
    Traditionally, keyword-search-based information retrieval (IR) has focused on “flat ” documents, which either do not have any inherent structure or have structure that is not exploited by the IR system. Thus, even if users wanted to search over only specifi

    TeXQuery: A Full-Text Search Extension to XQuery

    Full text link
    One of the key benefits of XML is its ability to represent a mix of structured and unstructured (text) data. Although current XML query languages such as XPath and XQuery can express rich queries over structured data, they can only express very rudimentary queries over text data. We thus propose TeXQuery, which is a powerful full-text search extension to XQuery. TeXQuery provides a rich set of fully composable full-text search primitives, such as Boolean connectives, phrase matching, proximity distance, stemming and thesauri. TeXQuery also enables users to seamlessly query over both structured and text data by embedding TeXQuery primitives in XQuery, and vice versa. Finally, TeXQuery supports a flexible scoring construct that can be used to score query results based on full-text predicates. TeX- Query is one of the proposals submitted to the W3C Full-Text Task Force, whose charter is to extend XQuery with full-text search capabilities

    On the Completeness of Full-Text Search Languages for XML

    Full text link
    We study formal properties of full-text search languages for XML. Our main contribution is the development of a formal model for full-text search based on the positions of tokens in XML nodes. Building on this model, we define a full-text calculus based on first-order logic, and a full-text algebra based on the relational algebra. We show that the full-text calculus and algebra are equivalent even in the presence of arbitrary position-based predicates, such as distance predicates and phrase matching. This suggests a notion of completeness for full-text languages. None of the full-text search languages that we are aware of are complete under the above characterization. We propose a new full-text language that is complete and naturally generalizes existing full-text languages. Our formalization in terms of the relational model can also serve as the basis for (a) joint optimization of structured and full-text search queries, and (b) ranking full-text search query results by leveraging existing work on the probabilistic relational model

    Data Structures

    No full text
    Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires complex index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results

    TeXQuery: A Full-Text Search Extension to XQuery (Part I: Language Specification)

    Full text link
    This report describes the TeXQuery language specification. TeXQuery is a full-text search extension to XQuery

    TeXQuery: A Full-Text Search Extension to XQuery (Part III: Use Cases Solutions)

    Full text link
    This report describes the TeXQuery use cases solutions. TeXQuery is a full-text search extension to XQuery

    TeXQuery: A Full-Text Search Extension to XQuery (Part II: Formal Semantics)

    Full text link
    This report describes the formal semantics of TeXQuery. TeXQuery is a full-text search extension to XQuery

    ABSTRACT

    No full text
    We demonstrate an XML full-text search engine that implements the TeXQuery language. TeXQuery is a powerful fulltext search extension to XQuery that provides a rich set of fully composable full-text primitives, such as phrase matching, proximity distance, stemming and thesauri. TeXQuery enables users to seamlessly query over both structure data and text, by embedding full-text primitives in XQuery and vice versa. TeXQuery also supports a flexible scoring construct that scores query results based on full-text predicates and permits top-k queries. TeXQuery is the precursor of the full-text language extension to XPath 2.0 and XQuery 1.0 currently being developed by W3C. 1

    Static score bucketing in inverted indexes

    No full text
    Maintaining strict static score order of inverted lists is a heuristic used by search engines to improve the quality of query results when the entire inverted lists cannot be processed. This heuristic, however, increases the cost of index generation and requires time-consuming index build algorithms. In this paper, we study a new index organization based on static score bucketing. We show that this new technique significantly improves in index build performance while having minimal impact on the quality of search results. We also provide upper bounds on the quality degradation and verify experimentally the benefits of the proposed approach
    corecore