59,831 research outputs found

    Efficient Spatial Keyword Search in Trajectory Databases

    Full text link
    An increasing amount of trajectory data is being annotated with text descriptions to better capture the semantics associated with locations. The fusion of spatial locations and text descriptions in trajectories engenders a new type of top-kk queries that take into account both aspects. Each trajectory in consideration consists of a sequence of geo-spatial locations associated with text descriptions. Given a user location λ\lambda and a keyword set ψ\psi, a top-kk query returns kk trajectories whose text descriptions cover the keywords ψ\psi and that have the shortest match distance. To the best of our knowledge, previous research on querying trajectory databases has focused on trajectory data without any text description, and no existing work has studied such kind of top-kk queries on trajectories. This paper proposes one novel method for efficiently computing top-kk trajectories. The method is developed based on a new hybrid index, cell-keyword conscious B+^+-tree, denoted by \cellbtree, which enables us to exploit both text relevance and location proximity to facilitate efficient and effective query processing. The results of our extensive empirical studies with an implementation of the proposed algorithms on BerkeleyDB demonstrate that our proposed methods are capable of achieving excellent performance and good scalability.Comment: 12 page

    A Formal Definition for Configuration

    Get PDF
    There exists a wide set of techniques to perform keyword-based search over relational databases but all of them match the keywords in the users' queries to elements of the databases to be queried as first step. The matching process is a time-consuming and complex task. So, improving the performance of this task is a key issue to improve the keyword based search on relational data sources.In this work, we show how to model the matching process on keyword-based search on relational databases by means of the symmetric group. Besides, how this approach reduces the search space is explained in detail

    Keyword search in graphs, relational databases and social networks

    Get PDF
    Keyword search, a well known mechanism for retrieving relevant information from a set of documents, has recently been studied for extracting information from structured data (e.g., relational databases and XML documents). It offers an alternative way to query languages (e.g., SQL) to explore databases, which is effective for lay users who may not be familiar with the database schema or the query language. This dissertation addresses some issues in keyword search in structured data. Namely, novel solutions to existing problems in keyword search in graphs or relational databases are proposed. In addition, a problem related to graph keyword search, team formation in social networks, is studied. The dissertation consists of four parts. The first part addresses keyword search over a graph which finds a substructure of the graph containing all or some of the query keywords. Current methods for keyword search over graphs may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, current methods explore both content and non-content nodes while searching for the result and are thus both time and memory consuming for large graphs. To address the above problems, we propose algorithms for finding r-cliques in graphs. An r-clique is a group of content nodes that cover all the input keywords and the distance between each pair of nodes is less than or equal to r. Two approximation algorithms that produce r-cliques with a bounded approximation ratio in polynomial delay are proposed. In the second part, the problem of duplication-free and minimal keyword search in graphs is studied. Current methods for keyword search in graphs may produce duplicate answers that contain the same set of content nodes. In addition, an answer found by these methods may not be minimal in the sense that some of the nodes in the answer may contain query keywords that are all covered by other nodes in the answer. Removing these nodes does not change the coverage of the answer but can make the answer more compact. We define the problem of finding duplication-free and minimal answers, and propose algorithms for finding such answers efficiently. Meaningful keyword search in relational databases is the subject of the third part of this dissertation. Keyword search over relational databases returns a join tree spanning tuples containing the query keywords. As many answers of varying quality can be found, and the user is often only interested in seeing the·top-k answers, how to gauge the relevance of answers to rank them is of paramount importance. This becomes more pertinent for databases with large and complex schemas. We focus on the relevance of join trees as the fundamental means to rank the answers. We devise means to measure relevance of relations and foreign keys in the schema over the information content of the database. The problem of keyword search over graph data is similar to the problem of team formation in social networks. In this setting, keywords represent skills and the nodes in a graph represent the experts that possess skills. Given an expert network, in which a node represents an expert that has a cost for using the expert service and an edge represents the communication cost between the two corresponding experts, we tackle the problem of finding a team of experts that covers a set of required skills and also minimizes the communication cost as well as the personnel cost of the team. We propose two types of approximation algorithms to solve this bi-criteria problem in the fourth part of this dissertation

    XKMis: Effective and efficient keyword search in XML databases

    Full text link
    We present XKMis, a system for keyword search in xml documents. Unlike previous work, our method is not based on the lowest common ancestor (LCA) or its variant, rather we divide the nodes into meaningful and self-containing information segments, called minimal information segments (MISs), and return MIS-subtrees which consist of MISs that are logically connected by the keywords. The MIS-subtrees are closer to what the user wants. The MIS-subtrees enable us to use the region code of xml trees to develop an algorithm for the search which is more efficient especially for large xml trees. We report our experiment results, which verify the better effectiveness and efficiency of our system. Copyright ©2009 ACM

    Analysis of multiple update techniques on a RDF keyword search system

    Get PDF
    Keyword search is a technology that allows non-expert users to explore and retrieve information and it is traditionally used for unstructured data, such as in Web page searches. In the last decade, this search method has also become popular for exploring structured data, such as relational databases or graphs. Instead of using complex SQL or SPARQL queries and when the underlying schema is known, the user writes a series of words(keywords) to search for what he or she needs, getting as answers the ones more matching with the search. Keyword search systems are challenged by two fundamental parameters, efficiency and effectiveness. In fact, efficiency and effectiveness are two qualities of a SPARQL, or SQL, query that returns an answer quickly and always accurate even when operating on large amounts of data. The "virtual documents" method allows keyword search systems to work also on large databases by generating answers to keyword queries in a reasonable time. This paper aims to replicate the keyword search systems based on "virtual documents" TSA+BM25 and TSA+VDP for RDF graphs. In addition, two methods of update processing in a keyword search system, will be presented and analyzed: BruteForce and semiTSA. Although keyword search is a growing research matter, the topic of updates on structured data, such as RDF data, had not yet been addressed in the literature.Keyword search is a technology that allows non-expert users to explore and retrieve information and it is traditionally used for unstructured data, such as in Web page searches. In the last decade, this search method has also become popular for exploring structured data, such as relational databases or graphs. Instead of using complex SQL or SPARQL queries and when the underlying schema is known, the user writes a series of words(keywords) to search for what he or she needs, getting as answers the ones more matching with the search. Keyword search systems are challenged by two fundamental parameters, efficiency and effectiveness. In fact, efficiency and effectiveness are two qualities of a SPARQL, or SQL, query that returns an answer quickly and always accurate even when operating on large amounts of data. The "virtual documents" method allows keyword search systems to work also on large databases by generating answers to keyword queries in a reasonable time. This paper aims to replicate the keyword search systems based on "virtual documents" TSA+BM25 and TSA+VDP for RDF graphs. In addition, two methods of update processing in a keyword search system, will be presented and analyzed: BruteForce and semiTSA. Although keyword search is a growing research matter, the topic of updates on structured data, such as RDF data, had not yet been addressed in the literature

    Keyword-based object search and exploration in multidimensional text databases

    Get PDF
    We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels. Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach

    Keyword Search in Large-Scale Databases with Topic Cluster Units

    Get PDF
    To solve the inefficiency of the existing keyword search methods in large databases, this paper proposes TCU-based query, an offline query method based on topic cluster units. First, topic cluster units (TCUs) are constructed through vertical grouping and horizontal grouping on tables and tuples. In contrast to traditional keyword query methods, this offline method cannot only reduce the query response time, but also return results comprising richer and more complete semantic information. In order to further improve the efficiency of data preprocessing, an optimized solution for table join ordering based on the genetic algorithm is presented. Second, we select index terms using the association rule, and then we build an index on every topic cluster; by doing so we can improve the query speed significantly. Finally, we conduct extensive experiments to demonstrate that our approach greatly improves the performance of keyword search
    corecore