36 research outputs found

    Type Ahead Search in Database using SQL

    Get PDF
    A type ahead search system computes answers on the fly as a user types in a keyword query character by character. We are going to study how to support type ahead search on data in a relational DBMS. We focus on how to help this type of search using the SQL. A prominent task that tests is how to influence existing database functionalities to meet the high performance to achieve an interactive speed. We extended the efficient way to the case of fuzzy queries, and suggested various techniques to improve query performance. We suggested incremental computation method to answer multi keyword queries, and calculated how to support first N queries and incremental updates. Our experimental results on large and real data sets showed that the proposed techniques can enables DBMS systems to support search as you type on large tables. DOI: 10.17762/ijritcc2321-8169.15024

    Efficient and Effective Query Auto-Completion

    Full text link
    Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level-agreement.Comment: Published in SIGIR 202

    Efficient Methods for Knowledge Base Construction and Query

    Full text link
    Recently, knowledge bases have been widely used in search engines, question-answering systems, and many other applications. The abundant entity profiles and relational information in knowledge bases help the downstream applications learn more about the user queries. However, in automated knowledge base construction, ambiguity in data sources is one of the main challenges. Given a constructed knowledge base, it is hard to efficiently find entities of interest and extract their relatedness information from the knowledge base due to its large capacity. In this thesis, we adopt natural language processing tools, machine learning and graph/text query techniques to deal with such challenges. First, we introduce a machine-learning based framework for efficient entity linking to deal with the ambiguity issue in documents. For entity linking, deep-learning-based methods have outperformed traditional machine-learning-based ones but demand a large amount of data and have a high cost on the training time. We propose a lightweight, customisable and time-efficient method, which is based on traditional machine learning techniques. Our approach achieves comparable performances to the state-of-the-art deep learning-based ones while being significantly faster to train. Second, we adopt deep learning to deal with the Entity Resolution (ER) problem, which aims to reduce the data ambiguity in structural data sources. The existing BERT-based method has set new state-of-the-art performance on the ER task, but it suffers from the high computational cost due to the large cardinality to match. We propose to use Bert in a siamese network to encode the entities separately and adopt the blocking-matching scheme in a multi-task learning framework. The blocking module filters out candidate entity pairs that are unlikely to be matched, while the matching module uses an enhanced alignment network to decide if a pair is a match. Experiments show that our approach outperforms state-of-the-art models in both efficiency and effectiveness. Third, we proposed a flexible Query auto-completion (QAC) framework to support efficient error-tolerant QAC for entity queries in the knowledge base. Most existing works overlook the quality of the suggested completions, and the efficiency needs to be improved. Our framework is designed on the basis of a noisy channel model, which consists of a language model and an error model. Thus, many QAC ranking methods and spelling correction methods can be easily plugged into the framework. To address the efficiency issue, we devise a neighbourhood generation method accompanied by a trie index to quickly find candidates for the error model. The experiments show that our method improves the state of the art of error-tolerant QAC. Last but not least, we designed a visualisation system to facilitate efficient relatedness queries in a large-scale knowledge graph. Given a pair of entities, we aim to efficiently extract a succinct sub-graph to explain the relatedness of the pair of entities. Existing methods, either graph-based or list-based, all have some limitations when dealing with large complex graphs. We propose to use Bi-simulation to summarise the sub-graph, where semantically similar entities are combined. Our method exhibits the most prominent patterns while keeping them in an integrated graph

    A Study To Support First-N Queries And Incremental Updates To Answer Multi Keyword Queries

    Get PDF
    Most search engines and online search forms maintain auto completion which demonstrates suggested queries or even answers on the fly as a user types in a keyword query character by character. As many search systems accumulate their information in a backend relational DBMS. Some databases such as Oracle and SQL server already support prefix search and we could use this feature to do search-as-you-type. Still not all databases provide this feature. For this reason we study new methods that can be used in all databases. One approach is to expand a separate application layer on the database to construct indexes and execute algorithms for answering queries. While this approach has the benefit of achieving a high performance its main drawback is duplicating data and indexes resulting in additional hardware costs. Another approach is to use database extenders such as DB2 Extenders, Informix Data Blades, Microsoft SQL Server Common Language Runtime (CLR) integration and Oracle Cartridges which permit developers to implement new functionalities to a DBMS. This approach is not possible for databases that do not provide such an extender interface such as MySQL. Because it needs to utilize proprietary interfaces provided by database vendors a solution for one database may not be portable to others. In addition an extender-based solution particularly those implemented in C/C++ could cause severe dependability and security problems to database engines

    Top-k String Auto-Completion with Synonyms

    Get PDF
    Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, "DBMS" is an abbreviation of "Database Management Systems". In this paper, we study a novel type of auto-completion by using synonyms and abbreviations. We propose three trie-based algorithms to solve the top-k auto-completion with synonyms; each one with different space and time complexity trade-offs. Experiments on large-scale datasets show that it is possible to support effective and efficient synonym-based retrieval of completions of a million strings with thousands of synonyms rules at about a microsecond per-completion, while taking small space overhead (i.e. 160-200 bytes per string).Peer reviewe

    Exploiting Query’s Temporal Patterns for Query Autocompletion

    Get PDF
    Query autocompletion (QAC) is a common interactive feature of web search engines. It aims at assisting users to formulate queries and avoiding spelling mistakes by presenting them with a list of query completions as soon as they start typing in the search box. Existing QAC models mostly rank the query completions by their past popularity collected in the query logs. For some queries, their popularity exhibits relatively stable or periodic behavior while others may experience a sudden rise in their query popularity. Current time-sensitive QAC models focus on either periodicity or recency and are unable to respond swiftly to such sudden rise, resulting in a less optimal QAC performance. In this paper, we propose a hybrid QAC model that considers two temporal patterns of query’s popularity, that is, periodicity and burst trend. In detail, we first employ the Discrete Fourier Transform (DFT) to identify the periodicity of a query’s popularity, by which we forecast its future popularity. Then the burst trend of query’s popularity is detected and incorporated into the hybrid model with its cyclic behavior. Extensive experiments on a large, real-world query log dataset infer that modeling the temporal patterns of query popularity in the form of its periodicity and its burst trend can significantly improve the effectiveness of ranking query completions

    SUPPORTING ADVANCED INTERACTIVE SEARCH USING INVERTED INDEX

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A pivotal prefix based filtering algorithm for string similarity search

    Full text link
    We study the string similarity search problem with edit-distance constraints, which, given a set of data strings and a query string, finds the similar strings to the query. Ex-isting algorithms use a signature-based framework. They first generate signatures for each string and then prune the dissimilar strings which have no common signatures to the query. However existing methods involve large numbers of signatures and many signatures are unnecessary. Reduc-ing the number of signatures not only increases the pruning power but also decreases the filtering cost. To address this problem, we propose a novel pivotal prefix filter which sig-nificantly reduces the number of signatures. We prove the pivotal filter achieves larger pruning power and less filter-ing cost than state-of-the-art filters. We develop a dynamic programming method to select high-quality pivotal prefix signatures to prune dissimilar strings with non-consecutive errors to the query. We propose an alignment filter that considers the alignments between signatures to prune large numbers of dissimilar pairs with consecutive errors to the query. Experimental results on three real datasets show that our method achieves high performance and outperforms the state-of-the-art methods by an order of magnitude

    The Same but Still Different: Forms in E-Government

    Get PDF
    Forms are essential artifacts of government service delivery to transmit information between the customer and the government. However, customers perceive forms as too complex. Since the complexity of a system is influenced by the diversity of its components, this paper’s main contribution is the identification of characteristics of forms and their components that drive the diversity of different forms. For this purpose, we evaluate a set of 69 forms of 27 German municipalities according to various criteria. The results reveal that different partitions of forms in subparts, varying sets of presented and requested data, different element types and varying captions for equal elements drive the complexity of current government forms. On the contrary, orders of elements are similar across the forms at hand
    corecore