411 research outputs found
Term-Specific Eigenvector-Centrality in Multi-Relation Networks
Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim
Flexible query processing of SPARQL queries
SPARQL is the predominant language for querying RDF data, which is the standard
model for representing web data and more specifically Linked Open Data (a
collection of heterogeneous connected data). Datasets in RDF form can be hard to
query by a user if she does not have a full knowledge of the structure of the dataset.
Moreover, many datasets in Linked Data are often extracted from actual web page
content which might lead to incomplete or inaccurate data.
We extend SPARQL 1.1 with two operators, APPROX and RELAX, previously
introduced in the context of regular path queries. Using these operators we are able
to support
exible querying over the property path queries of SPARQL 1.1. We call
this new language SPARQLAR.
Using SPARQLAR users are able to query RDF data without fully knowing the
structure of a dataset. APPROX and RELAX encapsulate different aspects of query flexibility: finding different answers and finding more answers, respectively. This
means that users can access complex and heterogeneous datasets without the need
to know precisely how the data is structured.
One of the open problems we address is how to combine the APPROX and
RELAX operators with a pragmatic language such as SPARQL. We also devise an
implementation of a system that evaluates SPARQLAR queries in order to study the
performance of the new language.
We begin by defining the semantics of SPARQLAR and the complexity of query
evaluation. We then present a query processing technique for evaluating SPARQLAR
queries based on a rewriting algorithm and prove its soundness and completeness.
During the evaluation of a SPARQLAR query we generate multiple SPARQL 1.1
queries that are evaluated against the dataset. Each such query will generate answers
with a cost that indicates their distance with respect to the exact form of the original
SPARQLAR query.
Our prototype implementation incorporates three optimisation techniques that
aim to enhance query execution performance: the first optimisation is a pre-computation
technique that caches the answers of parts of the queries generated by the rewriting
algorithm. These answers will then be reused to avoid the re-execution of those sub-queries. The second optimisation utilises a summary of the dataset to discard
queries that it is known will not return any answer. The third optimisation technique
uses the query containment concept to discard queries whose answers would
be returned by another query at the same or lower cost.
We conclude by conducting a performance study of the system on three different
RDF datasets: LUBM (Lehigh University Benchmark), YAGO and DBpedia
Expression and Efficient Processing of Fuzzy Queries in a Graph Database Context
International audienceGraph databases have aroused a large interest in the last years thanks to their large scope of potential applications (e.g. social networks, biomedical networks, data stemming from the web). In a similar way as what has already been proposed in relational databases, defining a language allowing a flexible querying of graph databases may greatly improve usability of data. This paper focuses on the notion of fuzzy graph database and describes a fuzzy query language that makes it possible to handle such database, which may be fuzzy or not, in a flexible way. This language, called FUDGE, can be used to express preference queries on fuzzy graph databases. The preferences concern i) the content of the vertices of the graph and ii) the structure of the graph. The FUDGE language is implemented in a system, called SUGAR, that we present in this article. We also discuss implementation issues of the FUDGE language in SUGAR
Applications of flexible querying to graph data
Graph data models provide flexibility and extensibility that makes them well-suited to modelling data that may be irregular, complex, and evolving in structure and content. However, a consequence of this is that users may not be familiar with the full structure of the data, which itself may be changing over time, making it hard for users to formulate queries that precisely match the data graph and meet their information seeking requirements. There is a need therefore for flexible querying systems over graph data that can automatically make changes to the user's query so as to find additional or different answers, and so help the user to retrieve information of relevance to them. This chapter describes recent work in this area, looking at a variety of graph query languages, applications, flexible querying techniques and implementations
EntiTables: Smart Assistance for Entity-Focused Tables
Tables are among the most powerful and practical tools for organizing and
working with data. Our motivation is to equip spreadsheet programs with smart
assistance capabilities. We concentrate on one particular family of tables,
namely, tables with an entity focus. We introduce and focus on two specific
tasks: populating rows with additional instances (entities) and populating
columns with new headings. We develop generative probabilistic models for both
tasks. For estimating the components of these models, we consider a knowledge
base as well as a large table corpus. Our experimental evaluation simulates the
various stages of the user entering content into an actual table. A detailed
analysis of the results shows that the models' components are complimentary and
that our methods outperform existing approaches from the literature.Comment: Proceedings of the 40th International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '17), 201
Optimisation techniques for flexible SPARQL queries
RDF datasets can be queried using the SPARQL language but are often irregularly structured and incomplete, which may make precise query formulation hard for users. The SPARQL language extends SPARQL 1.1 with two operators - APPROX and RELAX - so as to allow flexible querying over property paths. These operators encapsulate different dimensions of query flexibility, namely approximation and generalisation, and they allow users to query complex, heterogeneous knowledge graphs without needing to know precisely how the data is structured. Earlier work has described the syntax, semantics and complexity of SPARQL, has demonstrated its practical feasibility, but has also highlighted the need for improving the speed of query evaluation. In the present paper, we focus on the design of two optimisation techniques targeted at speeding up the execution of SPARQL queries and on their empirical evaluation on three knowledge graphs: LUBM, DBpedia and YAGO. We show that applying these optimisations can result in substantial improvements in the execution times of longer-running queries (sometimes by one or more orders of magnitude) without incurring significant performance penalties for fast queries
Machine Learning-based Query Augmentation for SPARQL Endpoints
Linked Data repositories have become a popular source of publicly-available data. Users accessing this data through SPARQL endpoints usually launch several restrictive yet similar consecutive queries, either to find the information they need through trial-and-error or to query related resources. However, instead of executing each individual query separately, query augmentation aims at modifying the incoming queries to retrieve more data that is potentially relevant to subsequent requests. In this paper, we propose a novel approach to query augmentation for SPARQL endpoints based on machine learning. Our approach separates the structure of the query from its contents and measures two types of similarity, which are then used to predict the structure and contents of the augmented query. We test the approach on the real-world query logs of the Spanish and English DBpedia and show that our approach yields high-accuracy prediction. We also show that, by caching the results of the predicted (More)This work has been supported by the European Union's Horizon 2020 research and innovation program (grant H2020-MSCA-ITN-2014-642963), the Spanish Ministry of Science and Innovation (contract TIN2015-65316, project RTC-2016-4952-7 and contract TIN2016-78011-C4-4-R), the Spanish Ministry of Education, Culture
and Sports (contract CAS18/00333) and the Generalitat de Catalunya (contract 2014-SGR-1051). The authors would also like to thank Toni Cortes for his feedback.Peer ReviewedPostprint (author's final draft
Un environnement de spécification et de découverte pour la réutilisation des composants logiciels dans le développement des logiciels distribués
Notre travail vise à élaborer une solution efficace pour la découverte et la réutilisation des composants logiciels dans les environnements de développement existants et couramment utilisés. Nous proposons une ontologie pour décrire et découvrir des composants logiciels élémentaires. La description couvre à la fois les propriétés fonctionnelles et les propriétés non fonctionnelles des composants logiciels exprimées comme des paramètres de QoS. Notre processus de recherche est basé sur la fonction qui calcule la distance sémantique entre la signature d'un composant et la signature d'une requête donnée, réalisant ainsi une comparaison judicieuse. Nous employons également la notion de " subsumption " pour comparer l'entrée-sortie de la requête et des composants. Après sélection des composants adéquats, les propriétés non fonctionnelles sont employées comme un facteur distinctif pour raffiner le résultat de publication des composants résultats. Nous proposons une approche de découverte des composants composite si aucun composant élémentaire n'est trouvé, cette approche basée sur l'ontologie commune. Pour intégrer le composant résultat dans le projet en cours de développement, nous avons développé l'ontologie d'intégration et les deux services " input/output convertor " et " output Matching ".Our work aims to develop an effective solution for the discovery and the reuse of software components in existing and commonly used development environments. We propose an ontology for describing and discovering atomic software components. The description covers both the functional and non functional properties which are expressed as QoS parameters. Our search process is based on the function that calculates the semantic distance between the component interface signature and the signature of a given query, thus achieving an appropriate comparison. We also use the notion of "subsumption" to compare the input/output of the query and the components input/output. After selecting the appropriate components, the non-functional properties are used to refine the search result. We propose an approach for discovering composite components if any atomic component is found, this approach based on the shared ontology. To integrate the component results in the project under development, we developed the ontology integration and two services " input/output convertor " and " output Matching "
Making Study Populations Visible through Knowledge Graphs
Treatment recommendations within Clinical Practice Guidelines (CPGs) are
largely based on findings from clinical trials and case studies, referred to
here as research studies, that are often based on highly selective clinical
populations, referred to here as study cohorts. When medical practitioners
apply CPG recommendations, they need to understand how well their patient
population matches the characteristics of those in the study cohort, and thus
are confronted with the challenges of locating the study cohort information and
making an analytic comparison. To address these challenges, we develop an
ontology-enabled prototype system, which exposes the population descriptions in
research studies in a declarative manner, with the ultimate goal of allowing
medical practitioners to better understand the applicability and
generalizability of treatment recommendations. We build a Study Cohort Ontology
(SCO) to encode the vocabulary of study population descriptions, that are often
reported in the first table in the published work, thus they are often referred
to as Table 1. We leverage the well-used Semanticscience Integrated Ontology
(SIO) for defining property associations between classes. Further, we model the
key components of Table 1s, i.e., collections of study subjects, subject
characteristics, and statistical measures in RDF knowledge graphs. We design
scenarios for medical practitioners to perform population analysis, and
generate cohort similarity visualizations to determine the applicability of a
study population to the clinical population of interest. Our semantic approach
to make study populations visible, by standardized representations of Table 1s,
allows users to quickly derive clinically relevant inferences about study
populations.Comment: 16 pages, 4 figures, 1 table, accepted to the ISWC 2019 Resources
Track (https://iswc2019.semanticweb.org/call-for-resources-track-papers/
- …