1,106 research outputs found

    Computing CQ lower-bounds over OWL 2 through approximation to RSA

    Full text link
    Conjunctive query (CQ) answering over knowledge bases is an important reasoning task. However, with expressive ontology languages such as OWL, query answering is computationally very expensive. The PAGOdA system addresses this issue by using a tractable reasoner to compute lower and upper-bound approximations, falling back to a fully-fledged OWL reasoner only when these bounds don't coincide. The effectiveness of this approach critically depends on the quality of the approximations, and in this paper we explore a technique for computing closer approximations via RSA, an ontology language that subsumes all the OWL 2 profiles while still maintaining tractability. We present a novel approximation of OWL 2 ontologies into RSA, and an algorithm to compute a closer (than PAGOdA) lower bound approximation using the RSA combined approach. We have implemented these algorithms in a prototypical CQ answering system, and we present a preliminary evaluation of our system that shows significant performance improvements w.r.t. PAGOdA.Comment: 26 pages, 1 figur

    A Scalable Backward Chaining-Based Reasoner for a Semantic Web

    Get PDF
    In this paper we consider knowledge bases that organize information using ontologies. Specifically, we investigate reasoning over a semantic web where the underlying knowledge base covers linked data about science research that are being harvested from the Web and are supplemented and edited by community members. In the semantic web over which we want to reason, frequent changes occur in the underlying knowledge base, and less frequent changes occur in the underlying ontology or the rule set that governs the reasoning. Interposing a backward chaining reasoner between a knowledge base and a query manager yields an architecture that can support reasoning in the face of frequent changes. However, such an interposition of the reasoning introduces uncertainty regarding the size and effort measurements typically exploited during query optimization. We present an algorithm for dynamic query optimization in such an architecture. We also introduce new optimization techniques to the backward-chaining algorithm. We show that these techniques together with the query-optimization reported on earlier, will allow us to actually outperform forward-chaining reasoners in scenarios where the knowledge base is subject to frequent change. Finally, we analyze the impact of these techniques on a large knowledge base that requires external storage

    Experiencing OptiqueVQS: A Multi-paradigm and Ontology-based Visual Query System for End Users

    Get PDF
    This is author's post-print version, published version available on http://link.springer.com/article/10.1007%2Fs10209-015-0404-5Data access in an enterprise setting is a determining factor for value creation processes, such as sense-making, decision-making, and intelligence analysis. Particularly, in an enterprise setting, intuitive data access tools that directly engage domain experts with data could substantially increase competitiveness and profitability. In this respect, the use of ontologies as a natural communication medium between end users and computers has emerged as a prominent approach. To this end, this article introduces a novel ontology-based visual query system, named OptiqueVQS, for end users. OptiqueVQS is built on a powerful and scalable data access platform and has a user-centric design supported by a widget-based flexible and extensible architecture allowing multiple coordinated representation and interaction paradigms to be employed. The results of a usability experiment performed with non-expert users suggest that OptiqueVQS provides a decent level of expressivity and high usability and hence is quite promising

    Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information

    Get PDF
    Tese de mestrado integrado. Engenharia Informåtica e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Enabling Complex Semantic Queries to Bioinformatics Databases through Intuitive Search Over Data

    Get PDF
    Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data already available publicly. However, the heterogene- ity of the existing data sources still poses significant challenges for achieving interoperability among biological databases. Furthermore, merely solving the technical challenges of data in- tegration, for example through the use of common data representation formats, leaves open the larger problem. Namely, the steep learning curve required for understanding the data models of each public source, as well as the technical language through which the sources can be queried and joined. As a consequence, most of the available biological data remain practically unexplored today. In this thesis, we address these problems jointly, by first introducing an ontology-based data integration solution in order to mitigate the data source heterogeneity problem. We illustrate through the concrete example of Bgee, a gene expression data source, how relational databases can be exposed as virtual Resource Description Framework (RDF) graphs, through relational-to-RDF mappings. This has the important advantage that the original data source can remain unmodified, while still becoming interoperable with external RDF sources. We complement our methods with applied case studies designed to guide domain experts in formulating expressive federated queries targeting the integrated data across the domains of evolutionary relationships and gene expression. More precisely, we introduce two com- parative analyses, first within the same domain (using orthology data from multiple, inter- operable, data sources) and second across domains, in order to study the relation between expression change and evolution rate following a duplication event. Finally, in order to bridge the semantic gap between users and data, we design and im- plement Bio-SODA, a question answering system over domain knowledge graphs, that does not require training data for translating user questions to SPARQL. Bio-SODA uses a novel ranking approach that combines syntactic and semantic similarity, while also incorporating node centrality metrics to rank candidate matches for a given user question. Our results in testing Bio-SODA across several real-world databases that span multiple domains (both within and outside bioinformatics) show that it can answer complex, multi-fact queries, be- yond the current state-of-the-art in the more well-studied open-domain question answering. -- L’intĂ©gration des donnĂ©es promet d’ĂȘtre l’un des principaux catalyseurs permettant d’extraire des nouveaux aperçus de la richesse des donnĂ©es biologiques dĂ©jĂ  disponibles publiquement. Cependant, l’hĂ©tĂ©rogĂ©nĂ©itĂ© des sources de donnĂ©es existantes pose encore des dĂ©fis importants pour parvenir Ă  l’interopĂ©rabilitĂ© des bases de donnĂ©es biologiques. De plus, en surmontant seulement les dĂ©fis techniques de l’intĂ©gration des donnĂ©es, par exemple grĂące Ă  l’utilisation de formats standard de reprĂ©sentation de donnĂ©es, on laisse ouvert un problĂšme encore plus grand. À savoir, la courbe d’apprentissage abrupte nĂ©cessaire pour comprendre la modĂ©li- sation des donnĂ©es choisie par chaque source publique, ainsi que le langage technique par lequel les sources peuvent ĂȘtre interrogĂ©s et jointes. Par consĂ©quent, la plupart des donnĂ©es biologiques publiquement disponibles restent pratiquement inexplorĂ©s aujourd’hui. Dans cette thĂšse, nous abordons l’ensemble des deux problĂšmes, en introduisant d’abord une solution d’intĂ©gration de donnĂ©es basĂ©e sur ontologies, afin d’attĂ©nuer le problĂšme d’hĂ©tĂ©- rogĂ©nĂ©itĂ© des sources de donnĂ©es. Nous montrons, Ă  travers l’exemple de Bgee, une base de donnĂ©es d’expression de gĂšnes, une approche permettant les bases de donnĂ©es relationnelles d’ĂȘtre publiĂ©s sous forme de graphes RDF (Resource Description Framework) virtuels, via des correspondances relationnel-vers-RDF (« relational-to-RDF mappings »). Cela prĂ©sente l’important avantage que la source de donnĂ©es d’origine peut rester inchangĂ©, tout en de- venant interopĂ©rable avec les sources RDF externes. Nous complĂ©tons nos mĂ©thodes avec des Ă©tudes de cas appliquĂ©es, conçues pour guider les experts du domaine dans la formulation de requĂȘtes fĂ©dĂ©rĂ©es expressives, ciblant les don- nĂ©es intĂ©grĂ©es dans les domaines des relations Ă©volutionnaires et de l’expression des gĂšnes. Plus prĂ©cisĂ©ment, nous introduisons deux analyses comparatives, d’abord dans le mĂȘme do- maine (en utilisant des donnĂ©es d’orthologie provenant de plusieurs sources de donnĂ©es in- teropĂ©rables) et ensuite Ă  travers des domaines interconnectĂ©s, afin d’étudier la relation entre le changement d’expression et le taux d’évolution suite Ă  une duplication de gĂšne. Enfin, afin de mitiger le dĂ©calage sĂ©mantique entre les utilisateurs et les donnĂ©es, nous concevons et implĂ©mentons Bio-SODA, un systĂšme de rĂ©ponse aux questions sur des graphes de connaissances domaine-spĂ©cifique, qui ne nĂ©cessite pas de donnĂ©es de formation pour traduire les questions des utilisateurs vers SPARQL. Bio-SODA utilise une nouvelle ap- proche de classement qui combine la similaritĂ© syntactique et sĂ©mantique, tout en incorporant des mĂ©triques de centralitĂ© des nƓuds, pour classer les possibles candidats en rĂ©ponse Ă  une question utilisateur donnĂ©e. Nos rĂ©sultats suite aux tests effectuĂ©s en utilisant Bio-SODA sur plusieurs bases de donnĂ©es Ă  travers plusieurs domaines (tantĂŽt liĂ©s Ă  la bioinformatique qu’extĂ©rieurs) montrent que Bio-SODA rĂ©ussit Ă  rĂ©pondre Ă  des questions complexes, en- gendrant multiples entitĂ©s, au-delĂ  de l’état actuel de la technique en matiĂšre de systĂšmes de rĂ©ponses aux questions sur les donnĂ©es structures, en particulier graphes de connaissances

    SUMA: A Partial Materialization-Based Scalable Query Answering in OWL 2 DL

    Get PDF
    AbstractOntology-mediated querying (OMQ) provides a paradigm for query answering according to which users not only query records at the database but also query implicit information inferred from ontology. A key challenge in OMQ is that the implicit information may be infinite, which cannot be stored at the database and queried by off -the -shelf query engine. The commonly adopted technique to deal with infinite entailments is query rewriting, which, however, comes at the cost of query rewriting at runtime. In this work, the partial materialization method is proposed to ensure that the extension is always finite. The partial materialization technology does not rewrite query but instead computes partial consequences entailed by ontology before the online query. Besides, a query analysis algorithm is designed to ensure the completeness of querying rooted and Boolean conjunctive queries over partial materialization. We also soundly and incompletely expand our method to support highly expressive ontology language, OWL 2 DL. Finally, we further optimize the materialization efficiency by role rewriting algorithm and implement our approach as a prototype system SUMA by integrating off-the-shelf efficient SPARQL query engine. The experiments show that SUMA is complete on each test ontology and each test query, which is the same as Pellet and outperforms PAGOdA. Besides, SUMA is highly scalable on large datasets
    • 

    corecore