788 research outputs found

    Static Analysis of Partial Referential Integrity for Better Quality SQL Data

    Get PDF
    Referential integrity ensures the consistency of data between database relations. The SQL standard proposes different semantics to deal with partial information under referential integrity. Simple semantics neglects tuples with nulls, and enjoys built-in support by commercial database systems. Partial semantics does check tuples with nulls, but does not enjoy built-in support. We investigate this mismatch between the SQL standard and real database systems. Indeed, insight is gained into the trade-off between cleaner data under partial semantics and the efficiency of checking simple semantics. The cost for referential integrity checking is evaluated for various dataset sizes, indexing structures and degrees of cleanliness. While the cost of partial semantics exceeds that of simple semantics, their performance trends follow similar patterns under growing database sizes. Applying multiple index structures and exploiting appropriate validation mechanisms increase the efficiency of checking partial semantics

    On the Quality of Relational Database Schemas in Open-source Software

    No full text
    International audienceThe relational schemas of 512 open-source projects storing their data in MySQL or PostgreSQL databases are investigated by querying the standard information schema, looking for overall design issues. The set of SQL queries used in our research is released as the Salix free software. As it is fully relational and relies on standards, it may be installed in any compliant database to help improve schemas. Our research shows that the overall quality of the surveyed schemas is poor: a majority of projects have at least one table without any primary key or unique constraint to identify a tuple; data security features such as referential integrity or transactional back-ends are hardly used; projects that advertise supporting both databases often have missing tables or attributes. PostgreSQL projects appear to be of higher quality than MySQL projects, and have been updated more recently, suggesting a more active maintenance. This is even better for projects with PostgreSQL-only support. However, the quality difference between both databases management systems is mostly due to MySQL-specific issues. An overall predictor of bad database quality is that a project chooses MySQL or PHP, while good design is found with PostgreSQL and Java. The few declared constraints allow to detect latent bugs, that are worth fixing: more declarations would certainly help unveil more bugs. Our survey also suggests that some features of MySQL and PostgreSQL are particularly error-prone. This first survey on the quality of relational schemas in open-source software provides a unique insight in the data engineering practice of these projects

    A Field Analysis of Relational Database Schemas in Open-source Software (Extended)

    No full text
    International audienceThe relational schemas of 512 open-source projects storing their data in MySQL or PostgreSQL databases are investigated by querying the standard information schema, looking for various issues. These SQL queries are released as the Salix free software. As it is fully relational and relies on standards, it may be installed in any compliant database to help improve schemas. The overall quality of the surveyed schemas is poor: a majority of projects have at least one table without any primary key or unique constraint to identify a tuple; data security features such as referential integrity or transactional back-ends are hardly used; projects that advertise supporting both databases often have missing tables or attributes. PostgreSQL projects have a better quality compared to MySQL projects, and it is even better for projects with PostgreSQL-only support. However, the difference between both databases is mostly due to MySQL-specific issues. An overall predictor of bad database quality is that a project chooses MySQL or PHP, while good design is found with PostgreSQL and Java. The few declared constraints allow to detect latent bugs, that are worth fixing: more declarations would certainly help unveil more bugs. Our survey also suggests some features of MySQL and PostgreSQL as particularly error-prone. This first survey on the quality of relational schemas in open-source software provides a unique insight in the data engineering practice of these project

    Output constraints in multimedia database systems

    Get PDF
    Zusammenfassung Semantische Fehler treten bei jeder Art von Datenverwaltung auf. Herkömmliche Datenbanksysteme verwenden eine IntegritĂ€tskontrolle, um semantische Fehler zu vermeiden. Um die IntegritĂ€t der Daten zu gewĂ€hrleisten werden IntegritĂ€tsregeln benutzt. Diese Regeln können allerdings nur die Konsistenz einfach strukturierter Daten ĂŒberprĂŒfen. Multimedia Datenbanksystem verwalten neben einfachen alphanumerischen Daten auch komplexe Mediendaten wie Videos. Um die Konsistenz dieser Daten zu sichern, bedarf es einer erheblichen Erweiterung des bestehenden IntegritĂ€tskonzeptes. Dabei muss besonders auf die konsistente Datenausgabe geachtet werden. Im Gegensatz zu alphanumerischen Daten können Mediendaten wĂ€hrend der Ausgabe verfĂ€lscht werden. Dieser Fall kann eintreten, wenn eine geforderte DatenqualitĂ€t bei der Ausgabe nicht erreicht werden kann oder wenn Synchronisationsbedingungen zwischen Medienobjekten nicht eingehalten werden können. Es besteht daher die Notwendigkeit, Ouptut Constraints einzufĂŒhren. Mit ihrer Hilfe kann definiert werden, wann die Ausgabe von Mediendaten semantisch korrekt ist. Das Datenbanksystem kann diese Bedingungen ĂŒberprĂŒfen und so gewĂ€hrleisten, dass der Nutzer semantisch einwandfreie Daten erhĂ€lt. In dieser Arbeit werden alle Aspekte betrachtet, die notwendig sind, um Ausgabebedingungen in ein Multimedia Datenbanksystem zu integrieren. Im einzelnen werden die Modellierung der Bedingungen, deren datenbankinterne ReprĂ€sentation sowie die BedingungsĂŒberprĂŒfung betrachtet. FĂŒr die Bedingungsmodellierung wird eine Constraint Language auf Basis der PrĂ€dikatenlogik eingefĂŒhrt. Um die Definition von zeitlichen und rĂ€umlichen Synchronisationen zu ermöglichen, verwenden wir Allen-Relationen. FĂŒr die effiziente ÜberprĂŒfung der Ausgabebedingungen mĂŒssen diese aus der Spezifikationssprache in eine datenbankinterne Darstellung ĂŒberfĂŒhrt werden. FĂŒr die datenbankinterne Darstellung werden Difference Constraints verwendet. Diese erlauben eine sehr effiziente BedingungsĂŒberprĂŒfung. Wir haben Algorithmen entwickelt, die eine effiziente ÜberprĂŒfung von Ausgabebedingungen erlauben und dies anhand von Experimenten nachgewiesen. Neben der ÜberprĂŒfung der Bedingungen mĂŒssen Mediendaten so synchronisiert werden, dass dies den Ausgabebedingungen entspricht. Wir haben dazu das Konzept des Output Schedules entwickelt. Dieser wird aufgrund der definierten Ausgabebedingungen generiert. Durch die Ausgabebedingungen, die in dieser Arbeit eingefĂŒhrt werden, werden semantische Fehler bei der Verwaltung von Mediendaten erheblich reduziert. Die Arbeit stellt daher einen Beitrag zur qualitativen Verbesserung der Verwaltung von Mediendaten dar.Semantic errors exist as long as data are managed. Traditional database systems try to prevent this errors by proposing integrity concepts for stored data. Integrity constraints are used to implement these integrity concepts. However, integrity constraints can only detect semantic errors in elementary data. Multimedia database systems manage elementary data as well as complex media data, like videos. Considering these media data we need a much wider consistency concept as traditional database systems provide. Especially, data output of media data must be taken into account. In contrast to alphanumeric data the semantics of media data can be falsified during data output if data quality or synchronization of data are not suitable. Thus, we need a concept for output constraints that allow for preventing semantic errors in case of data output. For integrating output constraints into a multimedia database system we have to consider modelling, representation and checking of output constraints. For modelling output constraints we have introduced a constraint language which uses the same principles as traditional constraint languages. Our constraint specification language must support temporal and spatial synchronization constraints. However, it is desired to support both kinds of synchronization in almost the same manner. Therefore, we use Allen-Relations for defining temporal synchronization constraints as well as for defining spatial synchronization constraints. We need a database internal representation of output constraints that makes efficient constraint checking possible. The Allen-Relations used in the constraint language cannot be checked efficiently. However, difference constraints are a class of constraints that allows an very efficient checking. Therefore, we use difference constraints as database internal representation of output constraints. As methods for checking consistency of output constraints we use an approach based on graph theory as well as an analytical approach. Both approaches require a constraint graph as data structure. For data output we need an output order that is adequate to the defined output constraints. This output schedule can be produced based on the output constraints. With output constraints, proposed in this thesis, semantical correctness of media data considering the data output can be supported.Thus, the contribution of this work is an qualitative improvement of managing media data by database systems

    Design and implementation of a filter engine for semantic web documents

    Get PDF
    This report describes our project that addresses the challenge of changes in the semantic web. Some studies have already been done for the so-called adaptive semantic web, such as applying inferring rules. In this study, we apply the technology of Event Notification System (ENS). Treating changes as events, we developed a notification system for such events

    Toward a Unified Timestamp with explicit precision

    Get PDF
    Demographic and health surveillance (DS) systems monitor and document individual- and group-level processes in well-defined populations over long periods of time. The resulting data are complex and inherently temporal. Established methods of storing and manipulating temporal data are unable to adequately address the challenges posed by these data. Building on existing standards, a temporal framework and notation are presented that are able to faithfully record all of the time-related information (or partial lack thereof) produced by surveillance systems. The Unified Timestamp isolates all of the inherent complexity of temporal data into a single data type and provides the foundation on which a Unified Timestamp class can be built. The Unified Timestamp accommodates both point- and interval-based time measures with arbitrary precision, including temporal sets. Arbitrary granularities and calendars are supported, and the Unified Timestamp is hierarchically organized, allowing it to represent an unlimited array of temporal entities.demographic surveillance, standardization, temporal databases, temporal integrity, timestamp, valid time
    • 

    corecore