10,112 research outputs found

    Identifying the most interactive object in spatial databases

    Full text link
    This paper investigates a new query, called an MIO query, that retrieves the Most Interactive Object in a given spatial dataset. Consider that an object consists of many spatial points. Given a distance threshold, we say that two objects interact with each other if they have a pair of points whose distance is within the threshold. An MIO query outputs the object that interacts with other objects the most, and it is useful for analytical applications e.g., neuroscience and trajectory databases. The MIO query processing problem is challenging: a nested loop algorithm is computationally inefficient and a theoretical algorithm is computationally efficient but incurs a quadratic space cost. Our solution efficiently processes MIO queries with a novel index, BIGrid (a hybrid index of compressed Bitset, Inverted list, and spatial Grid structures), with a practical memory cost. Furthermore, our solution is designed so that previous query results and multi-core environments can be exploited to accelerate query processing efficiency. Our experiments on synthetic and real datasets demonstrate the efficiency of our solution.Amagata D., Hara T.. Identifying the most interactive object in spatial databases. Proceedings - International Conference on Data Engineering 2019-April, 1286 (2019); https://doi.org/10.1109/ICDE.2019.00117

    Efficient and Effective Query Auto-Completion

    Full text link
    Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.Comment: Published in SIGIR 202

    Challenging Ubiquitous Inverted Files

    Get PDF
    Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ā€˜theā€™ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ā€˜the database approachā€™: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach

    The Potential of Learned Index Structures for Index Compression

    Full text link
    Inverted indexes are vital in providing fast key-word-based search. For every term in the document collection, a list of identifiers of documents in which the term appears is stored, along with auxiliary information such as term frequency, and position offsets. While very effective, inverted indexes have large memory requirements for web-sized collections. Recently, the concept of learned index structures was introduced, where machine learned models replace common index structures such as B-tree-indexes, hash-indexes, and bloom-filters. These learned index structures require less memory, and can be computationally much faster than their traditional counterparts. In this paper, we consider whether such models may be applied to conjunctive Boolean querying. First, we investigate how a learned model can replace document postings of an inverted index, and then evaluate the compromises such an approach might have. Second, we evaluate the potential gains that can be achieved in terms of memory requirements. Our work shows that learned models have great potential in inverted indexing, and this direction seems to be a promising area for future research.Comment: Will appear in the proceedings of ADCS'1

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
    • ā€¦
    corecore