23 research outputs found

    SPARQL Update for Materialised Triple Stores under DL-Lite RDFS Entailment

    Get PDF
    Abstract. Updates in RDF stores have recently been standardised in the SPARQL 1.1 Update specification. However, computing answers entailed by ontologies in triple stores is usually treated orthogonally to updates. Even W3C’s SPARQL 1.1 Update language and SPARQL 1.1 Entailment Regimes specifications explicitly exclude a standard behaviour for entailment regimes other than simple entailment in the context of updates. In this paper, we take a first step to close this gap. We define a fragment of SPARQL basic graph patterns corresponding to (the RDFS fragment of) DL-Lite and the corresponding SPARQL update language, dealing with updates both of ABox and of TBox statements. We discuss possible semantics along with potential strategies for implementing them. Particularly, we treat materialised RDF stores, which store all entailed triples explicitly, and preservation of materialisation upon ABox and TBox updates.

    Sur l'analyse statique des requêtes SPARQL avec la logique modale

    Get PDF
    Static analysis is a core task in query optimization and knowledge base verification. We study static analysis techniques for SPARQL, the standard language for querying Semantic Web data. Specifically, we investigate the query containment problem and the query-update independence analysis. We are interested in developing techniques through reductions to the validity problem in logic.We address SPARQL query containment with optional matching. We focus on the class of well-designed SPARQL queries, proposed in the literature as a fragment of the language with good properties regarding query evaluation. SPARQL is interpreted over graphs, hence we encode it in a graph logic, specifically the modal logic K interpreted over label transition systems. We show that this logic is powerful enough to deal with query containment for the well-designed fragment of SPARQL. We show how to translate RDF graphs into transition systems and SPARQL queries into K-formulae. Therefore, query containment in SPARQL can be reduced to unsatisfiability in K.We also report on a preliminary overview of the SPARQL query-update problem. A query is independent of an update when the execution of the update does not affect the result of the query. Determining independence is especially useful in the contest of huge RDF repositories, where it permits to avoid expensive yet useless re-evaluation of queries. While this problem has been intensively studied for fragments of relational calculus, no works exist for the standard query language for the semantic web. We report on our investigations on how a notion of independence can be defined in the SPARQL contextL’analyse statique est une tâche essentielle dans l’optimisation des requêtes et la vérification de la base de graphes RDF. Nous étudions des techniques d’analyse statique pour SPARQL, le langage standard pour l’interrogation des données du Web sémantique. Plus précisément, nous étudions le problème d’inclusion des requêtes et de l’analyse de l’indépendance entre les requêtes et la mise à jour de la base de graphes RDF.Nous sommes intéressés par le développement de techniques grâce à des réductions au problème de la satisfaisabilité de la logique.Nous nous traitons le problème d’inclusion des requêtes SPARQL en présence de l’opérateur OPTIONAL. L’optionalité est l’un des constructeurs les plus compliqués dans SPARQL et aussi celui qui rend ce langage plus expressif que les langages de requêtes classiques, comme SQL.Nous nous concentrons sur la classe de requêtes appelée "well-designed SPARQL", proposées dans la littérature comme un fragment du langage avec de bonnes propriétés en matière d’évaluation des requêtes incluent l’opération OPTIONAL. À ce jour, l’inclusion de requête a été testée à l’aide de différentes techniques: homomorphisme de graphes, bases de données canoniques, techniques de la théorie des automates et réduction au problème de la validité d’une logique. Dans cette thèse, nous utilisons la dernière technique pour tester l’inclusion des requêtes SPARQL avec OPTIONAL utilisant une logique expressive appelée «logique K». En utilisant cette technique, il est possible de régler le problème d’inclusion des requêtes pour plusieurs fragment de SPARQL, même en présence de schémas. Cette extensibilité n’est pas garantie par les autres méthodes.Nous montrons comment traduire a graphe RDF en un système de transitions, ainsi que une requête SPARQL en une formula K. Avec ces traductions, l’inclusion des requêtes dans SPARQL peut être réduite au test de la validité d’une formule logique. Un avantage de cette approche est d’ouvrir la voie pour des implémentations utilisant solveurs de satisfiabilité pour K.Nous présentons un banc d’essais de tests d’inclusion pour les requêtes SPARQL avec OPTIONAL. Nous avons effectué des expériences pour tester et comparer des solveurs d’inclusion de l’état de l’art.Nous présentons également un aperçu préliminaire du problème d’indépendance entre requête et mise à jour. Une requête est indépendante de la mise à jour lorsque l’exécution de la mise à jour ne modifie pas le résultat de la requête. Bien que ce problème ait été intensivement étudié pour des fragments de calcul relationnel, il n’existe pas de travaux pour le langage de requêtes standard pour le web sémantique. Nous proposons une définition de la notion de l’indépendance dans le contexte de SPARQL et nous établissons des premières pistes de analyse statique dans certains situations d’inclusion entre une requête et une mise à jour

    Accessing and using complex multimedia documents in a digital library

    Get PDF
    Dans le cadre de trois projets européens, notre équipe a mis au point un modèle de données et un langage de requête pour bibliothèques numériques supportant l'identification, la structuration, les métadonnées, la réutilisation, et la découverte des ressources numériques. Le modèle proposé est inspiré par le Web et il est formalisé comme une théorie du premier ordre, dont certains modèles correspondent à la notion de bibliothèque numérique. En outre, une traduction complète du modèle en RDF et du langage de requêtes en SPARQL a également été proposée pour démontrer son adéquation à des applications pratiques. Le choix de RDF est dû au fait qu il est un langage de représentation généralement accepté dans le cadre des bibliothèques numériques et du Web sémantique. L objectif de cette thèse était double: concevoir et mettre en œuvre une forme simplifiée de système de gestion de bibliothèques numériques, d une part, et contribuer à l enrichissement du modèle, d autre part. Pour atteindre cet objectif nous avons développé un prototype d un système de bibliothèque numérique utilisant un stockage RDF pour faciliter la gestion interne des métadonnées. Le prototype permet aux utilisateurs de gérer et d interroger les métadonnées des ressources numériques ou non-numériques dans le système en utilisant des URIs pour identifier les ressources, un ensemble de prédicats pour la description de ressources, et des requêtes conjonctives simples pour la découverte de connaissances dans le système. Le prototype est mis en œuvre en utilisant les technologies Java et l environnement de Google Web Toolkit dont l'architecture du système se compose d'une couche de stockage, d une couche de métier logique, d une couche de service, et d une interface utilisateur. Pendant la thèse, le prototype a été construit, testé et débogué localement, puis déployé sur Google App Engine. Dans l avenir, il peut être étendu pour devenir un système complet de gestion de bibliothèques numériques. Par ailleurs, la thèse présente également notre contribution à la génération de contenu par réutilisation de ressources. Il s agit d un travail théorique dont le but est d enrichir le modèle en lui ajoutant un service important, à savoir la possibilité de création de nouvelles ressources à partir de celles stockées dans le système. L incorporation de ce service dans le système sera effectuée ultérieurement.In the context of three European projects, our research team has developed a data model and query language for digital libraries supporting identification, structuring, metadata, and discovery and reuse of digital resources. The model is inspired by the Web and it is formalized as a first-order theory, certain models of which correspond to the notion of digital library. In addition, a full translation of the model to RDF and of the query language to SPARQL has been proposed to demonstrate the feasibility of the model and its suitability for practical applications. The choice of RDF is due to the fact that it is a generally accepted representation language in the context of digital libraries and the Semantic Web. One of the major aims of the thesis was to design and actually implement a simplified form of a digital library management system based on the theoretical model. To obtain this, we have developed a prototype based on RDF and SPARQL, which uses a RDF store to facilitate internal management of metadata. The prototype allows users to manage and query metadata of digital or non-digital resources in the system, using URIs as resource identifiers, a set of predicates to model descriptions of resources, and simple conjunctive queries to discover knowledge in the system. The prototype is implemented by using Java technologies and the Google Web Toolkit framework whose system architecture consists of a storage layer, a business logic layer, a service layer and a user interface. During the thesis work, the prototype was built, tested, and debugged locally and then deployed on Google App Engine. In the future, it will be expanded to become a full fledged digital library management system. Moreover, the thesis also presents our contribution to content generation by reuse. This is mostly theoretical work whose purpose is to enrich the model and query language by providing an important community service. The incorporation of this service in the implemented system is left to future work.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

    Strategies for Managing Linked Enterprise Data

    Get PDF
    Data, information and knowledge become key assets of our 21st century economy. As a result, data and knowledge management become key tasks with regard to sustainable development and business success. Often, knowledge is not explicitly represented residing in the minds of people or scattered among a variety of data sources. Knowledge is inherently associated with semantics that conveys its meaning to a human or machine agent. The Linked Data concept facilitates the semantic integration of heterogeneous data sources. However, we still lack an effective knowledge integration strategy applicable to enterprise scenarios, which balances between large amounts of data stored in legacy information systems and data lakes as well as tailored domain specific ontologies that formally describe real-world concepts. In this thesis we investigate strategies for managing linked enterprise data analyzing how actionable knowledge can be derived from enterprise data leveraging knowledge graphs. Actionable knowledge provides valuable insights, supports decision makers with clear interpretable arguments, and keeps its inference processes explainable. The benefits of employing actionable knowledge and its coherent management strategy span from a holistic semantic representation layer of enterprise data, i.e., representing numerous data sources as one, consistent, and integrated knowledge source, to unified interaction mechanisms with other systems that are able to effectively and efficiently leverage such an actionable knowledge. Several challenges have to be addressed on different conceptual levels pursuing this goal, i.e., means for representing knowledge, semantic data integration of raw data sources and subsequent knowledge extraction, communication interfaces, and implementation. In order to tackle those challenges we present the concept of Enterprise Knowledge Graphs (EKGs), describe their characteristics and advantages compared to existing approaches. We study each challenge with regard to using EKGs and demonstrate their efficiency. In particular, EKGs are able to reduce the semantic data integration effort when processing large-scale heterogeneous datasets. Then, having built a consistent logical integration layer with heterogeneity behind the scenes, EKGs unify query processing and enable effective communication interfaces for other enterprise systems. The achieved results allow us to conclude that strategies for managing linked enterprise data based on EKGs exhibit reasonable performance, comply with enterprise requirements, and ensure integrated data and knowledge management throughout its life cycle

    Emergent relational schemas for RDF

    Get PDF

    Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake

    Get PDF
    Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution

    Realizing pervasive compution vision: A context-aware mobile application approach.

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Scalable Ontology Systems

    Get PDF
    Since the adoption of the Resource Description Framework (RDF) by the World Wide Web Consortium (W3C), ontologies have become commonplace as a way to represent both knowledge and data. RDF databases have flexible schemas, are easy to integrate and allow a semantically rich query language. Unfortunately, these advantages come at the expense of increased query and application complexity. Existing RDF systems have attempted to address this problem by representing RDF data in relational format and translating queries and answers to and from SQL. As we will show, typical access patterns in RDF are substantially different than those in relational databases, to the extent that the performance of relational-backed systems degrades significantly for large datasets or complex queries. In this dissertation, we propose two solutions to the scalability issue in RDF databases. First, we introduce Annotated RDF, a representation language that extends the semantics of RDF by allowing triples to be annotated with partially ordered information such as temporal validity intervals, probabilities, provenance and many others. In standard RDF, using such information creates a blowup in the size of the database and therefore greatly increases the data complexity of queries. We define a query language for Annotated RDF that extends the RDF query language SPARQL and provides query processing and view maintenance algorithms. Our experimental evaluation shows Annotated RDF can answer queries 1.5 to 3.5 times faster than widely used systems such as Jena2, Sesame2 or Oracle 11g. Second, we introduce GRIN, to our knowledge the first index structure designed specifically for SPARQL queries. We describe query and update processing algorithms and a theoretical analysis of index optimization. GRIN is extended to Annotated RDF and evaluated thoroughly on real-world datasets of up to 26 million triples and benchmark synthetic datasets of up to 1 billion triples. Our results show that for SPARQL queries, GRIN outperforms all relational index structures at comparable resource expenditure. Moreover, we show GRIN can be integrated with Annotated RDF, but also with existing systems such as Jena2 or LucidDB

    Binary RDF for Scalable Publishing, Exchanging and Consumption in the Web of Data

    Get PDF
    El actual diluvio de datos está inundando la web con grandes volúmenes de datos representados en RDF, dando lugar a la denominada 'Web de Datos'. En esta tesis proponemos, en primer lugar, un estudio profundo de aquellos textos que nos permitan abordar un conocimiento global de la estructura real de los conjuntos de datos RDF, HDT, que afronta la representación eficiente de grandes volúmenes de datos RDF a través de estructuras optimizadas para su almacenamiento y transmisión en red. HDT representa efizcamente un conjunto de datos RDF a través de su división en tres componentes: la cabecera (Header), el diccionario (Dictionary) y la estructura de sentencias RDF (Triples). A continuación, nos centramos en proveer estructuras eficientes de dichos componentes, ocupando un espacio comprimido al tiempo que se permite el acceso directo a cualquier dat
    corecore