64 research outputs found

    Toward relevant answers to queries on incomplete databases

    Get PDF
    Incomplete and uncertain information is ubiquitous in database management applications. However, the techniques specifically developed to handle incomplete data are not sufficient. Even the evaluation of SQL queries on databases containing NULL values remains a challenge after 40 years. There is no consensus on what an answer to a query on an incomplete database should be, and the existing notions often have limited applicability. One of the most prevalent techniques in the literature is based on finding answers that are certainly true, independently of how missing values are interpreted. However, this notion has yielded several conflicting formal definitions for certain answers. Based on the fact that incomplete data can be enriched by some additional knowledge, we designed a notion able to unify and explain the different definitions for certain answers. Moreover, the knowledge-preserving certain answers notion is able to provide the first well-founded definition of certain answers for the relational bag data model and value-inventing queries, addressing some key limitations of previous approaches. However, it doesn’t provide any guarantee about the relevancy of the answers it captures. To understand what would be relevant answers to queries on incomplete databases, we designed and conducted a survey on the everyday usage of NULL values among database users. One of the findings from this socio-technical study is that even when users agree on the possible interpretation of NULL values, they may not agree on what a satisfactory query answer is. Therefore, to be relevant, query evaluation on incomplete databases must account for users’ tasks and preferences. We model users’ preferences and tasks with the notion of regret. The regret function captures the task-dependent loss a user endures when he considers a database as ground truth instead of another. Thanks to this notion, we designed the first framework able to provide a score accounting for the risk associated with query answers. It allows us to define the risk-minimizing answers to queries on incomplete databases. We show that for some regret functions, regret-minimizing answers coincide with certain answers. Moreover, as the notion is more agile, it can capture more nuanced answers and more interpretations of incompleteness. A different approach to improve the relevancy of an answer is to explain its provenance. We propose to partition the incompleteness into sources and measure their respective contribution to the risk of answer. As a first milestone, we study several models to predict the evolution of the risk when we clean a source of incompleteness. We implemented the framework, and it exhibits promising results on relational databases and queries with aggregate and grouping operations. Indeed, the model allows us to infer the risk reduction obtained by cleaning an attribute. Finally, by considering a game theoretical approach, the model can provide an explanation for answers based on the contribution of each attributes to the risk

    BiTRDF: Extending RDF for BiTemporal Data

    Full text link
    The Internet is not only a platform for communication, transactions, and cloud storage, but it is also a large knowledge store where people as well as machines can create, manipulate, infer, and make use of data and knowledge. The Semantic Web was developed for this purpose. It aims to help machines understand the meaning of data and knowledge so that machines can use the data and knowledge in decision making. The Resource Description Framework (RDF) forms the foundation of the Semantic Web which is organized as the Semantic Web Layer Cake. RDF is limited and can only express a binary relationship with the format of . However, expressing higher order relationships requires reification which is very cumbersome. Naturally, time varying data is very common and cannot be represented by only binary relationships. We first surveyed approaches that use reification or extend RDF for higher order relationships. Then we proposed a new data model, BiTemporal RDF (BiTRDF), that incorporates both valid time and transaction time explicitly into standard RDF resources. We defined the BiTRDF model with its elements, vocabulary, semantics, and entailment, and the BiTemporal SPARQL (BiT-SPARQL) query language. We discussed the foundation for implementing BiTRDF and we also explored different approaches to implement the BiTRDF model. We concluded this thesis with potential research directions. This thesis lays the foundation for a new approach to easily embed any or more dimensions, such as temporal data, spatial data, probabilistic data, confidence levels, etc

    Yavaa: supporting data workflows from discovery to visualization

    Get PDF
    Recent years have witness an increasing number of data silos being opened up both within organizations and to the general public: Scientists publish their raw data as supplements to articles or even standalone artifacts to enable others to verify and extend their work. Governments pass laws to open up formerly protected data treasures to improve accountability and transparency as well as to enable new business ideas based on this public good. Even companies share structured information about their products and services to advertise their use and thus increase revenue. Exploiting this wealth of information holds many challenges for users, though. Oftentimes data is provided as tables whose sheer endless rows of daunting numbers are barely accessible. InfoVis can mitigate this gap. However, offered visualization options are generally very limited and next to no support is given in applying any of them. The same holds true for data wrangling. Only very few options to adjust the data to the current needs and barely any protection are in place to prevent even the most obvious mistakes. When it comes to data from multiple providers, the situation gets even bleaker. Only recently tools emerged to search for datasets across institutional borders reasonably. Easy-to-use ways to combine these datasets are still missing, though. Finally, results generally lack proper documentation of their provenance. So even the most compelling visualizations can be called into question when their coming about remains unclear. The foundations for a vivid exchange and exploitation of open data are set, but the barrier of entry remains relatively high, especially for non-expert users. This thesis aims to lower that barrier by providing tools and assistance, reducing the amount of prior experience and skills required. It covers the whole workflow ranging from identifying proper datasets, over possible transformations, up until the export of the result in the form of suitable visualizations

    ETSI SmartM2M Technical Report 103715; Study for oneM2M; Discovery and Query solutions analysis & selection

    Get PDF
    The oneM2M system has implemented basic native discovery capabilities. In order to enhance the semantic capabilities of the oneM2M architecture by providing solid contributions to the oneM2M standards, four Technical Reports have been developed. Each of them is the outcome of a special study phase: requirements, study, simulation and standardization phase. The present document covers the second phase and provides the basis for the other documents. It identifies, defines and analyses relevant approaches with respect to the use cases and requirements developed in ETSI TR 103 714 The most appropriate one will be selected

    28th International Symposium on Temporal Representation and Reasoning (TIME 2021)

    Get PDF
    The 28th International Symposium on Temporal Representation and Reasoning (TIME 2021) was planned to take place in Klagenfurt, Austria, but had to move to an online conference due to the insecurities and restrictions caused by the pandemic. Since its frst edition in 1994, TIME Symposium is quite unique in the panorama of the scientifc conferences as its main goal is to bring together researchers from distinct research areas involving the management and representation of temporal data as well as the reasoning about temporal aspects of information. Moreover, TIME Symposium aims to bridge theoretical and applied research, as well as to serve as an interdisciplinary forum for exchange among researchers from the areas of artifcial intelligence, database management, logic and verifcation, and beyond

    Explainable methods for knowledge graph refinement and exploration via symbolic reasoning

    Get PDF
    Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.Wissensgraphen haben viele Anwendungen in verschiedenen Bereichen, beispielsweise im Finanz- und Gesundheitswesen. Wissensgraphen sind jedoch unvollstĂ€ndig und enthalten auch ungĂŒltige Daten. Hohe Abdeckung und Korrektheit erfordern neue Methoden zur Wissensgraph-Erweiterung und Wissensgraph-Validierung. Beide Aufgaben zusammen werden als Wissensgraph-Verfeinerung bezeichnet. Ein wichtiger Aspekt dabei ist die ErklĂ€rbarkeit und VerstĂ€ndlichkeit von Wissensgraphinhalten fĂŒr Nutzer. In Anwendungen ist darĂŒber hinaus die nutzerseitige Exploration von Wissensgraphen von besonderer Bedeutung. Suchen und Navigieren im Graph hilft dem Anwender, die Wissensinhalte und ihre Limitationen besser zu verstehen. Aufgrund der riesigen Menge an vorhandenen EntitĂ€ten und Fakten ist die Wissensgraphen-Exploration eine Herausforderung. Taxonomische Typsystem helfen dabei, sind jedoch fĂŒr tiefergehende Exploration nicht ausreichend. Diese Dissertation adressiert die Herausforderungen der Wissensgraph-Verfeinerung und der Wissensgraph-Exploration durch algorithmische Inferenz ĂŒber dem Wissensgraph. Sie erweitert logisches Schlussfolgern und kombiniert es mit anderen Methoden, insbesondere mit neuronalen Wissensgraph-Einbettungen und mit Text-Mining. Diese neuen Methoden liefern Ausgaben mit ErklĂ€rungen fĂŒr Nutzer. Die Dissertation umfasst folgende BeitrĂ€ge: Insbesondere leistet die Dissertation folgende BeitrĂ€ge: ‱ Zur Wissensgraph-Erweiterung prĂ€sentieren wir ExRuL, eine Methode zur Revision von Horn-Regeln durch HinzufĂŒgen von Ausnahmebedingungen zum Rumpf der Regeln. Die erweiterten Regeln können neue Fakten inferieren und somit LĂŒcken im Wissensgraphen schließen. Experimente mit großen Wissensgraphen zeigen, dass diese Methode Fehler in abgeleiteten Fakten erheblich reduziert und nutzerfreundliche ErklĂ€rungen liefert. ‱ Mit RuLES stellen wir eine Methode zum Lernen von Regeln vor, die auf probabilistischen ReprĂ€sentationen fĂŒr fehlende Fakten basiert. Das Verfahren erweitert iterativ die aus einem Wissensgraphen induzierten Regeln, indem es neuronale Wissensgraph-Einbettungen mit Informationen aus Textkorpora kombiniert. Bei der Regelgenerierung werden neue Metriken fĂŒr die RegelqualitĂ€t verwendet. Experimente zeigen, dass RuLES die QualitĂ€t der gelernten Regeln und ihrer Vorhersagen erheblich verbessert. ‱ Zur UnterstĂŒtzung der Wissensgraph-Validierung wird ExFaKT vorgestellt, ein Framework zur Konstruktion von ErklĂ€rungen fĂŒr Faktkandidaten. Die Methode transformiert Kandidaten mit Hilfe von Regeln in eine Menge von Aussagen, die leichter zu finden und zu validieren oder widerlegen sind. Die Ausgabe von ExFaKT ist eine Menge semantischer Evidenzen fĂŒr Faktkandidaten, die aus Textkorpora und dem Wissensgraph extrahiert werden. Experimente zeigen, dass die Transformationen die Ausbeute und QualitĂ€t der entdeckten ErklĂ€rungen deutlich verbessert. Die generierten unterstĂŒtzen ErklĂ€rungen unterstĂŒtze sowohl die manuelle Wissensgraph- Validierung durch Kuratoren als auch die automatische Validierung. ‱ Zur UnterstĂŒtzung der Wissensgraph-Exploration wird ExCut vorgestellt, eine Methode zur Erzeugung von informativen EntitĂ€ts-Clustern mit ErklĂ€rungen unter Verwendung von Wissensgraph-Einbettungen und automatisch induzierten Regeln. Eine Cluster-ErklĂ€rung besteht aus einer Kombination von Relationen zwischen den EntitĂ€ten, die den Cluster identifizieren. ExCut verbessert gleichzeitig die Cluster- QualitĂ€t und die Cluster-ErklĂ€rbarkeit durch iteratives VerschrĂ€nken des Lernens von Einbettungen und Regeln. Experimente zeigen, dass ExCut Cluster von hoher QualitĂ€t berechnet und dass die Cluster-ErklĂ€rungen fĂŒr Nutzer informativ sind

    Analytical Queries on Vanilla RDF Graphs with a Guided Query Builder Approach

    Get PDF
    International audienceAs more and more data are available as RDF graphs, the availability of tools for data analytics beyond semantic search becomes a key issue of the Semantic Web. Previous work require the modelling of data cubes on top of RDF graphs. We propose an approach that directly answers analytical queries on unmodified (vanilla) RDF graphs by exploiting the computation features of SPARQL 1.1. We rely on the NF design pattern to design a query builder that completely hides SPARQL behind a verbalization in natural language; and that gives intermediate results and suggestions at each step. Our evaluations show that our approach covers a large range of use cases, scales well on large datasets, and is easier to use than writing SPARQL queries

    Graph Pattern Matching in GQL and SQL/PGQ

    Get PDF
    As graph databases become widespread, JTC1 -- the committee in joint charge of information technology standards for the International Organization for Standardization (ISO), and International Electrotechnical Commission (IEC) -- has approved a project to create GQL, a standard property graph query language. This complements a project to extend SQL with a new part, SQL/PGQ, which specifies how to define graph views over an SQL tabular schema, and to run read-only queries against them. Both projects have been assigned to the ISO/IEC JTC1 SC32 working group for Database Languages, WG3, which continues to maintain and enhance SQL as a whole. This common responsibility helps enforce a policy that the identical core of both PGQ and GQL is a graph pattern matching sub-language, here termed GPML. The WG3 design process is also analyzed by an academic working group, part of the Linked Data Benchmark Council (LDBC), whose task is to produce a formal semantics of these graph data languages, which complements their standard specifications. This paper, written by members of WG3 and LDBC, presents the key elements of the GPML of SQL/PGQ and GQL in advance of the publication of these new standards
    • 

    corecore