27 research outputs found
Keyword Search on RDF Graphs - A Query Graph Assembly Approach
Keyword search provides ordinary users an easy-to-use interface for querying
RDF data. Given the input keywords, in this paper, we study how to assemble a
query graph that is to represent user's query intention accurately and
efficiently. Based on the input keywords, we first obtain the elementary query
graph building blocks, such as entity/class vertices and predicate edges. Then,
we formally define the query graph assembly (QGA) problem. Unfortunately, we
prove theoretically that QGA is a NP-complete problem. In order to solve that,
we design some heuristic lower bounds and propose a bipartite graph
matching-based best-first search algorithm. The algorithm's time complexity is
, where is the number of the keywords and is a
tunable parameter, i.e., the maximum number of candidate entity/class vertices
and predicate edges allowed to match each keyword. Although QGA is intractable,
both and are small in practice. Furthermore, the algorithm's time
complexity does not depend on the RDF graph size, which guarantees the good
scalability of our system in large RDF graphs. Experiments on DBpedia and
Freebase confirm the superiority of our system on both effectiveness and
efficiency
A Polynomial Delay Algorithm for Generating Connected Induced Subgraphs of a Given Cardinality
We give a polynomial delay algorithm, that for any graph and positive
integer , enumerates all connected induced subgraphs of of order .
Our algorithm enumerates each subgraph in at most
and uses linear space ,
where and are respectively the number of vertices and edges of and
is the maximum degree
Living Knowledge
Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following
K-Nearest Keyword Search in RDF Graphs
We formulate and tackle a flexible and useful query, namely k-nearest keyword (k-NK) query, which can identify the relationship between vertices (or keywords) in an RDF graph, where users are only required to specify two query keywords q and w (without knowing the domain knowledge). In particular, a k-NK query returns k closest pairs of vertices (ui; vi) in the RDF graph such that vertices ui and vi contain keywords q and w, respectively, and vi has the smallest (shortest path) distance to ui (i.e., vi is the nearest neighbor of ui). In order to efficiently answer k-NK queries, in this paper, we propose three efficient query answering techniques that utilize effective pruning strategies and cost-model-based indexing mechanisms. We also confirm the effects of our proposed approaches on real and synthetic RDF data sets through extensive experiments
Complex correspondences for query patterns rewriting
International audienceThis paper discusses the use of complex alignments in the task of automatic query patterns rewriting. We apply this approach in SWIP, a system that allows for querying RDF data from natural language-based queries, hiding the complexity of SPARQL. SWIP is based on the use of query patterns that characterise families of queries and that are instantiated with respect to the initial user query expressed in natural language. However, these patterns are specific to the vocabulary used to describe the data source to be queried. For rewriting query patterns, we experiment ontology matching approaches in order to find complex correspondences between two ontologies describing data sources. From the alignments and initial query patterns, we rewrite these patterns in order to be able to query the data described using the target ontology. These experiments have been carried out on an ontology on the music domain and DBpedia ontology
Using Patterns for Keyword Search in RDF Graphs *
ABSTRACT An increasing number of RDF datasets are available on the Web. Querying RDF data requires the knowledge of a query language such as SPARQL; it also requires some information describing the content of these datasets. The goal of our work is to facilitate the querying of RDF datasets, and we present an approach for enabling users to search in RDF data using keywords. We introduce the notion of pattern to integrate external knowledge in the search process, which increases the quality of the results
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
We aim to provide table answers to keyword queries against knowledge bases.
For queries referring to multiple entities, like "Washington cities population"
and "Mel Gibson movies", it is better to represent each relevant answer as a
table which aggregates a set of entities or entity-joins within the same table
scheme or pattern. In this paper, we study how to find highly relevant patterns
in a knowledge base for user-given keyword queries to compose table answers. A
knowledge base can be modeled as a directed graph called knowledge graph, where
nodes represent entities in the knowledge base and edges represent the
relationships among them. Each node/edge is labeled with type and text. A
pattern is an aggregation of subtrees which contain all keywords in the texts
and have the same structure and types on node/edges. We propose efficient
algorithms to find patterns that are relevant to the query for a class of
scoring functions. We show the hardness of the problem in theory, and propose
path-based indexes that are affordable in memory. Two query-processing
algorithms are proposed: one is fast in practice for small queries (with small
patterns as answers) by utilizing the indexes; and the other one is better in
theory, with running time linear in the sizes of indexes and answers, which can
handle large queries better. We also conduct extensive experimental study to
compare our approaches with a naive adaption of known techniques.Comment: VLDB 201
Top-k Keyword Search Over Graphs Based On Backward Search
Keyword search is one of the most friendly and intuitive information retrieval methods. Using the keyword search to get the connected subgraph has a lot of application in the graph-based cognitive computation, and it is a basic technology. This paper focuses on the top-k keyword searching over graphs. We implemented a keyword search algorithm which applies the backward search idea. The algorithm locates the keyword vertices firstly, and then applies backward search to find rooted trees that contain query keywords. The experiment shows that query time is affected by the iteration number of the algorithm
Reachability Analysis of Graph Modelled Collections
This paper is concerned with potential recall in multimodal
information retrieval in graph-based models. We provide a framework to
leverage individuality and combination of features of different modalities
through our formulation of faceted search. We employ a potential recall
analysis on a test collection to gain insight on the corpus and further
highlight the role of multiple facets, relations between the objects, and
semantic links in recall improvement. We conduct the experiments on
a multimodal dataset containing approximately 400,000 documents and
images. We demonstrate that leveraging multiple facets increases most
notably the recall for very hard topics by up to 316%