27 research outputs found

    Keyword Search on RDF Graphs - A Query Graph Assembly Approach

    Full text link
    Keyword search provides ordinary users an easy-to-use interface for querying RDF data. Given the input keywords, in this paper, we study how to assemble a query graph that is to represent user's query intention accurately and efficiently. Based on the input keywords, we first obtain the elementary query graph building blocks, such as entity/class vertices and predicate edges. Then, we formally define the query graph assembly (QGA) problem. Unfortunately, we prove theoretically that QGA is a NP-complete problem. In order to solve that, we design some heuristic lower bounds and propose a bipartite graph matching-based best-first search algorithm. The algorithm's time complexity is O(k2l⋅l3l)O(k^{2l} \cdot l^{3l}), where ll is the number of the keywords and kk is a tunable parameter, i.e., the maximum number of candidate entity/class vertices and predicate edges allowed to match each keyword. Although QGA is intractable, both ll and kk are small in practice. Furthermore, the algorithm's time complexity does not depend on the RDF graph size, which guarantees the good scalability of our system in large RDF graphs. Experiments on DBpedia and Freebase confirm the superiority of our system on both effectiveness and efficiency

    A Polynomial Delay Algorithm for Generating Connected Induced Subgraphs of a Given Cardinality

    Full text link
    We give a polynomial delay algorithm, that for any graph GG and positive integer kk, enumerates all connected induced subgraphs of GG of order kk. Our algorithm enumerates each subgraph in at most O((kmin⁡{(n−k),kΔ})2(Δ+log⁡k))O((k\min\{(n-k),k\Delta\})^2(\Delta+\log k)) and uses linear space O(n+m)O(n+m), where nn and mm are respectively the number of vertices and edges of GG and Δ\Delta is the maximum degree

    Living Knowledge

    Get PDF
    Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

    K-Nearest Keyword Search in RDF Graphs

    Get PDF
    We formulate and tackle a flexible and useful query, namely k-nearest keyword (k-NK) query, which can identify the relationship between vertices (or keywords) in an RDF graph, where users are only required to specify two query keywords q and w (without knowing the domain knowledge). In particular, a k-NK query returns k closest pairs of vertices (ui; vi) in the RDF graph such that vertices ui and vi contain keywords q and w, respectively, and vi has the smallest (shortest path) distance to ui (i.e., vi is the nearest neighbor of ui). In order to efficiently answer k-NK queries, in this paper, we propose three efficient query answering techniques that utilize effective pruning strategies and cost-model-based indexing mechanisms. We also confirm the effects of our proposed approaches on real and synthetic RDF data sets through extensive experiments

    Complex correspondences for query patterns rewriting

    Get PDF
    International audienceThis paper discusses the use of complex alignments in the task of automatic query patterns rewriting. We apply this approach in SWIP, a system that allows for querying RDF data from natural language-based queries, hiding the complexity of SPARQL. SWIP is based on the use of query patterns that characterise families of queries and that are instantiated with respect to the initial user query expressed in natural language. However, these patterns are specific to the vocabulary used to describe the data source to be queried. For rewriting query patterns, we experiment ontology matching approaches in order to find complex correspondences between two ontologies describing data sources. From the alignments and initial query patterns, we rewrite these patterns in order to be able to query the data described using the target ontology. These experiments have been carried out on an ontology on the music domain and DBpedia ontology

    Using Patterns for Keyword Search in RDF Graphs *

    Get PDF
    ABSTRACT An increasing number of RDF datasets are available on the Web. Querying RDF data requires the knowledge of a query language such as SPARQL; it also requires some information describing the content of these datasets. The goal of our work is to facilitate the querying of RDF datasets, and we present an approach for enabling users to search in RDF data using keywords. We introduce the notion of pattern to integrate external knowledge in the search process, which increases the quality of the results

    Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

    Full text link
    We aim to provide table answers to keyword queries against knowledge bases. For queries referring to multiple entities, like "Washington cities population" and "Mel Gibson movies", it is better to represent each relevant answer as a table which aggregates a set of entities or entity-joins within the same table scheme or pattern. In this paper, we study how to find highly relevant patterns in a knowledge base for user-given keyword queries to compose table answers. A knowledge base can be modeled as a directed graph called knowledge graph, where nodes represent entities in the knowledge base and edges represent the relationships among them. Each node/edge is labeled with type and text. A pattern is an aggregation of subtrees which contain all keywords in the texts and have the same structure and types on node/edges. We propose efficient algorithms to find patterns that are relevant to the query for a class of scoring functions. We show the hardness of the problem in theory, and propose path-based indexes that are affordable in memory. Two query-processing algorithms are proposed: one is fast in practice for small queries (with small patterns as answers) by utilizing the indexes; and the other one is better in theory, with running time linear in the sizes of indexes and answers, which can handle large queries better. We also conduct extensive experimental study to compare our approaches with a naive adaption of known techniques.Comment: VLDB 201

    Top-k Keyword Search Over Graphs Based On Backward Search

    Full text link
    Keyword search is one of the most friendly and intuitive information retrieval methods. Using the keyword search to get the connected subgraph has a lot of application in the graph-based cognitive computation, and it is a basic technology. This paper focuses on the top-k keyword searching over graphs. We implemented a keyword search algorithm which applies the backward search idea. The algorithm locates the keyword vertices firstly, and then applies backward search to find rooted trees that contain query keywords. The experiment shows that query time is affected by the iteration number of the algorithm

    Reachability Analysis of Graph Modelled Collections

    Get PDF
    This paper is concerned with potential recall in multimodal information retrieval in graph-based models. We provide a framework to leverage individuality and combination of features of different modalities through our formulation of faceted search. We employ a potential recall analysis on a test collection to gain insight on the corpus and further highlight the role of multiple facets, relations between the objects, and semantic links in recall improvement. We conduct the experiments on a multimodal dataset containing approximately 400,000 documents and images. We demonstrate that leveraging multiple facets increases most notably the recall for very hard topics by up to 316%
    corecore