9 research outputs found

    On the Codd Semantics of SQL Nulls

    Get PDF

    SQL Nulls and Two-Valued Logic

    Get PDF

    Coping with Incomplete Data: Recent Advances

    Get PDF
    Handling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on three-valued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We re-examine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers

    Coping with Incomplete Data: Recent Advances

    Get PDF
    International audienceHandling incomplete data in a correct manner is a notoriously hard problem in databases. Theoretical approaches rely on the computationally hard notion of certain answers, while practical solutions rely on ad hoc query evaluation techniques based on threevalued logic. Can we find a middle ground, and produce correct answers efficiently? The paper surveys results of the last few years motivated by this question. We reexamine the notion of certainty itself, and show that it is much more varied than previously thought. We identify cases when certain answers can be computed efficiently and, short of that, provide deterministic and probabilistic approximation schemes for them. We look at the role of three-valued logic as used in SQL query evaluation, and discuss the correctness of the choice, as well as the necessity of such a logic for producing query answers

    Handling SQL Nulls with Two-Valued Logic

    Get PDF
    The design of SQL is based on a three-valued logic (3VL), rather than the familiar Boolean logic with truth values true and false, to accommodate the additional truth value unknown for handling nulls. It is viewed as indispensable for SQL expressiveness, but is at the same time much criticized for leading to unintuitive behavior of queries and thus being a source of programmer mistakes. We show that, contrary to the widely held view, SQL could have been designed based on the standard Boolean logic, without any loss of expressiveness and without giving up nulls. The approach itself follows SQL’s evaluation which only retains tuples for which conditions in the WHERE clause evaluate to true. We show that conflating unknown, resulting from nulls, with false leads to an equally expressive version of SQL that does not use the third truth value. Queries written under the two-valued semantics can be efficiently translated into the standard SQL and thus executed on any existing RDBMS. These results cover the core of the SQL 1999 Standard, including SELECT-FROM-WHERE-GROUP BY-HAVING queries extended with subqueries and IN/EXISTS/ANY/ALL conditions, and recursive queries. We provide two extensions of this result showing that no other way of converting 3VL into Boolean logic, nor any other many-valued logic for treating nulls could have possibly led to a more expressive language. These results not only present small modifications of SQL that eliminate the source of many programmer errors without the need to reimplement database internals, but they also strongly suggest that new query languages for various data models do not have to follow the much criticized SQL’s three-valued approach

    Metamorphic testing framework for SQL queries with null values

    Get PDF
    Trabajo de Fin de Máster en Ingeniería Informática, Facultad de Informática UCM, Departamento de Sistemas Informáticos y Computación, Curso 2019/2020La falta de información dentro de las bases de datos relacionales expresada mediante valores nulos presenta un problema a la hora de garantizar la calidad de los datos y de las consultas SQL sobre esos datos. Esto ocurre debido a que existen múltiples interpretaciones de los valores nulos y en muchos casos las consultas no consideran que se puedan producir valores nulos como resultado de su evaluación, o bien no reflejan en el código la interpretación correcta de dichos valores. Los valores nulos han estado presentes en las bases de datos desde prácticamente las primeras implementaciones de sistemas de gestión de bases de datos relacionales, pero la implementación del estándar SQL plantea múltiples problemas. Al realizar consultas sobre una base de datos que maneja valores nulos, los resultados pueden no ser los esperados, ya sea por omisión de resultados (falsos negativos) o por resultados incorrectos (falsos positivos). Por esta razón, en este trabajo se propone una herramienta que analice diferentes consultas SQL y permita al desarrollador detectar posibles errores en aquellas consultas que tengan valores nulos utilizando un marco de testing metamórfico. Después de estudiar la bibliografía relacionada sobre pruebas de bases de datos, esta parece ser la primera propuesta que aplica relaciones metamórficas a consultas SQL con valores nulos.The lack of information within relational databases expressed by null values poses important problems when trying to ensure the quality of the data and of the SQL queries evaluated on that data. This occurs because there are multiple interpretations of null values and in many cases the queries either do not consider that null values can occur as a result of their evaluation, or they do not reflect the correct interpretation of these values in the code. Null values have been present in databases since the first implementations of relational database management systems, but the implementation of the SQL standard can generate multiple problems. When querying a database that handles null values, the results may not be produced as expected, either due to the omission of results (false negatives) or incorrect results (false positives). For this reason, this work proposes a tool that analyzes different SQL queries and allows the developer to detect possible errors in those queries that have null values using a metamorphic testing framework. After studying the related literature on database testing, this appears to be the first proposal of the application of metamorphic relationships to SQL queries on null values.Depto. de Sistemas Informáticos y ComputaciónFac. de InformáticaTRUEunpu

    Toward relevant answers to queries on incomplete databases

    Get PDF
    Incomplete and uncertain information is ubiquitous in database management applications. However, the techniques specifically developed to handle incomplete data are not sufficient. Even the evaluation of SQL queries on databases containing NULL values remains a challenge after 40 years. There is no consensus on what an answer to a query on an incomplete database should be, and the existing notions often have limited applicability. One of the most prevalent techniques in the literature is based on finding answers that are certainly true, independently of how missing values are interpreted. However, this notion has yielded several conflicting formal definitions for certain answers. Based on the fact that incomplete data can be enriched by some additional knowledge, we designed a notion able to unify and explain the different definitions for certain answers. Moreover, the knowledge-preserving certain answers notion is able to provide the first well-founded definition of certain answers for the relational bag data model and value-inventing queries, addressing some key limitations of previous approaches. However, it doesn’t provide any guarantee about the relevancy of the answers it captures. To understand what would be relevant answers to queries on incomplete databases, we designed and conducted a survey on the everyday usage of NULL values among database users. One of the findings from this socio-technical study is that even when users agree on the possible interpretation of NULL values, they may not agree on what a satisfactory query answer is. Therefore, to be relevant, query evaluation on incomplete databases must account for users’ tasks and preferences. We model users’ preferences and tasks with the notion of regret. The regret function captures the task-dependent loss a user endures when he considers a database as ground truth instead of another. Thanks to this notion, we designed the first framework able to provide a score accounting for the risk associated with query answers. It allows us to define the risk-minimizing answers to queries on incomplete databases. We show that for some regret functions, regret-minimizing answers coincide with certain answers. Moreover, as the notion is more agile, it can capture more nuanced answers and more interpretations of incompleteness. A different approach to improve the relevancy of an answer is to explain its provenance. We propose to partition the incompleteness into sources and measure their respective contribution to the risk of answer. As a first milestone, we study several models to predict the evolution of the risk when we clean a source of incompleteness. We implemented the framework, and it exhibits promising results on relational databases and queries with aggregate and grouping operations. Indeed, the model allows us to infer the risk reduction obtained by cleaning an attribute. Finally, by considering a game theoretical approach, the model can provide an explanation for answers based on the contribution of each attributes to the risk
    corecore