788 research outputs found
Static Analysis of Partial Referential Integrity for Better Quality SQL Data
Referential integrity ensures the consistency of data between database relations. The SQL standard proposes different semantics to deal with partial information under referential integrity. Simple semantics neglects tuples with nulls, and enjoys built-in support by commercial database systems. Partial semantics does check tuples with nulls, but does not enjoy built-in support. We investigate this mismatch between the SQL standard and real database systems. Indeed, insight is gained into the trade-off between cleaner data under partial semantics and the efficiency of checking simple semantics. The cost for referential integrity checking is evaluated for various dataset sizes, indexing structures and degrees of cleanliness. While the cost of partial semantics exceeds that of simple semantics, their performance trends follow similar patterns under growing database sizes. Applying multiple index structures and exploiting appropriate validation mechanisms increase the efficiency of checking partial semantics
On the Quality of Relational Database Schemas in Open-source Software
International audienceThe relational schemas of 512 open-source projects storing their data in MySQL or PostgreSQL databases are investigated by querying the standard information schema, looking for overall design issues. The set of SQL queries used in our research is released as the Salix free software. As it is fully relational and relies on standards, it may be installed in any compliant database to help improve schemas. Our research shows that the overall quality of the surveyed schemas is poor: a majority of projects have at least one table without any primary key or unique constraint to identify a tuple; data security features such as referential integrity or transactional back-ends are hardly used; projects that advertise supporting both databases often have missing tables or attributes. PostgreSQL projects appear to be of higher quality than MySQL projects, and have been updated more recently, suggesting a more active maintenance. This is even better for projects with PostgreSQL-only support. However, the quality difference between both databases management systems is mostly due to MySQL-specific issues. An overall predictor of bad database quality is that a project chooses MySQL or PHP, while good design is found with PostgreSQL and Java. The few declared constraints allow to detect latent bugs, that are worth fixing: more declarations would certainly help unveil more bugs. Our survey also suggests that some features of MySQL and PostgreSQL are particularly error-prone. This first survey on the quality of relational schemas in open-source software provides a unique insight in the data engineering practice of these projects
A Field Analysis of Relational Database Schemas in Open-source Software (Extended)
International audienceThe relational schemas of 512 open-source projects storing their data in MySQL or PostgreSQL databases are investigated by querying the standard information schema, looking for various issues. These SQL queries are released as the Salix free software. As it is fully relational and relies on standards, it may be installed in any compliant database to help improve schemas. The overall quality of the surveyed schemas is poor: a majority of projects have at least one table without any primary key or unique constraint to identify a tuple; data security features such as referential integrity or transactional back-ends are hardly used; projects that advertise supporting both databases often have missing tables or attributes. PostgreSQL projects have a better quality compared to MySQL projects, and it is even better for projects with PostgreSQL-only support. However, the difference between both databases is mostly due to MySQL-specific issues. An overall predictor of bad database quality is that a project chooses MySQL or PHP, while good design is found with PostgreSQL and Java. The few declared constraints allow to detect latent bugs, that are worth fixing: more declarations would certainly help unveil more bugs. Our survey also suggests some features of MySQL and PostgreSQL as particularly error-prone. This first survey on the quality of relational schemas in open-source software provides a unique insight in the data engineering practice of these project
Output constraints in multimedia database systems
Zusammenfassung
Semantische Fehler treten bei jeder Art von Datenverwaltung auf. Herkömmliche
Datenbanksysteme verwenden eine IntegritÀtskontrolle, um semantische Fehler zu
vermeiden. Um die IntegritÀt der Daten zu gewÀhrleisten werden IntegritÀtsregeln
benutzt. Diese Regeln können allerdings nur die Konsistenz einfach
strukturierter Daten ĂŒberprĂŒfen.
Multimedia Datenbanksystem verwalten neben einfachen alphanumerischen Daten auch
komplexe Mediendaten wie Videos. Um die Konsistenz dieser Daten zu sichern,
bedarf es einer erheblichen Erweiterung des bestehenden IntegritÀtskonzeptes.
Dabei muss besonders auf die konsistente Datenausgabe geachtet werden. Im
Gegensatz zu alphanumerischen Daten können Mediendaten wÀhrend der Ausgabe
verfÀlscht werden. Dieser Fall kann eintreten, wenn eine geforderte
DatenqualitÀt bei der Ausgabe nicht erreicht werden kann oder wenn
Synchronisationsbedingungen
zwischen Medienobjekten nicht eingehalten werden können. Es besteht daher die
Notwendigkeit, Ouptut Constraints einzufĂŒhren. Mit ihrer Hilfe kann definiert
werden, wann die Ausgabe von Mediendaten semantisch korrekt ist. Das
Datenbanksystem kann diese Bedingungen ĂŒberprĂŒfen und so gewĂ€hrleisten, dass der
Nutzer semantisch einwandfreie Daten erhÀlt.
In dieser Arbeit werden alle Aspekte betrachtet, die notwendig sind, um
Ausgabebedingungen in ein Multimedia Datenbanksystem zu integrieren. Im
einzelnen werden die Modellierung der Bedingungen, deren datenbankinterne
ReprĂ€sentation sowie die BedingungsĂŒberprĂŒfung betrachtet.
FĂŒr die Bedingungsmodellierung wird eine Constraint Language auf Basis der
PrĂ€dikatenlogik eingefĂŒhrt. Um die Definition von zeitlichen und rĂ€umlichen
Synchronisationen zu ermöglichen, verwenden wir Allen-Relationen. FĂŒr die
effiziente ĂberprĂŒfung der Ausgabebedingungen mĂŒssen diese aus der
Spezifikationssprache in eine datenbankinterne Darstellung ĂŒberfĂŒhrt werden.
FĂŒr die datenbankinterne Darstellung werden Difference Constraints verwendet.
Diese erlauben eine sehr effiziente BedingungsĂŒberprĂŒfung. Wir haben Algorithmen
entwickelt, die eine effiziente ĂberprĂŒfung von Ausgabebedingungen erlauben und
dies anhand von Experimenten nachgewiesen. Neben der ĂberprĂŒfung der Bedingungen
mĂŒssen Mediendaten so synchronisiert werden, dass dies den Ausgabebedingungen
entspricht. Wir haben dazu das Konzept des Output Schedules entwickelt. Dieser
wird aufgrund der definierten Ausgabebedingungen generiert.
Durch die Ausgabebedingungen, die in dieser Arbeit eingefĂŒhrt werden, werden
semantische Fehler bei der Verwaltung von Mediendaten erheblich reduziert. Die
Arbeit stellt daher einen Beitrag zur qualitativen Verbesserung der Verwaltung
von Mediendaten dar.Semantic errors exist as long as data are managed. Traditional database systems try to prevent this errors by proposing integrity
concepts for stored data. Integrity constraints are used to implement these integrity concepts. However, integrity constraints can only detect semantic errors in elementary data.
Multimedia database systems manage elementary data as well as complex media data, like videos. Considering these media data we need a much wider consistency concept as traditional database systems provide. Especially, data output of media data must be taken into account. In contrast to alphanumeric data the semantics of media data can be falsified during data output if data quality or
synchronization of data are not suitable. Thus, we need a concept for output constraints that allow for preventing semantic errors in case of data output. For integrating output constraints into a multimedia database system we have to consider modelling, representation and checking of output constraints.
For modelling output constraints we have introduced a constraint language which uses the same principles as traditional constraint languages. Our constraint specification language must support temporal and spatial synchronization constraints. However, it is desired to support both kinds of synchronization in almost the same manner. Therefore, we use Allen-Relations for defining temporal
synchronization constraints as well as for defining spatial synchronization constraints.
We need a database internal representation of output constraints that makes efficient constraint checking possible. The Allen-Relations used in the constraint language cannot be checked efficiently. However, difference constraints are a class of constraints that allows an very efficient checking. Therefore, we use difference constraints as database internal representation of output constraints.
As methods for checking consistency of output constraints we use an approach based on graph theory as well as an analytical approach. Both approaches require a constraint graph as data structure. For data output we need an output order that is adequate to the defined output constraints. This output schedule can be produced based on the output constraints.
With output constraints, proposed in this thesis, semantical correctness of media data considering the data output can be supported.Thus, the contribution of this work is an qualitative improvement of managing media data by database systems
Design and implementation of a filter engine for semantic web documents
This report describes our project that addresses the challenge of changes in the semantic web. Some studies have already been done for the so-called adaptive semantic web, such as applying inferring rules. In this study, we apply the technology of Event Notification System (ENS). Treating changes as events, we
developed a notification system for such events
Recommended from our members
SQL database design static analysis
textStatic analysis of database design and implementation is not a new idea. Many researchers have covered the topic in detail and defined a number of metrics that are well known within the research community. Unfortunately, unlike the use of metrics in code development, the use of these metrics has not been widely adopted within the development community. It seems that a disjunction exists between the research into database design metrics and the actual use of databases in industry. This paper describes new metrics that can be used in industry to ensure that a database's current implementation supports long term scalability, to support easily developed and maintainable code, or to guide developers towards functions or design elements that can be modified to improve scalability of their data systems. In addition, this paper describes the production of a tool designed to extract these metrics from SQL Server and includes feedback from professionals regarding the usefulness of the tool and the measures contained within its output.Electrical and Computer Engineerin
Toward a Unified Timestamp with explicit precision
Demographic and health surveillance (DS) systems monitor and document individual- and group-level processes in well-defined populations over long periods of time. The resulting data are complex and inherently temporal. Established methods of storing and manipulating temporal data are unable to adequately address the challenges posed by these data. Building on existing standards, a temporal framework and notation are presented that are able to faithfully record all of the time-related information (or partial lack thereof) produced by surveillance systems. The Unified Timestamp isolates all of the inherent complexity of temporal data into a single data type and provides the foundation on which a Unified Timestamp class can be built. The Unified Timestamp accommodates both point- and interval-based time measures with arbitrary precision, including temporal sets. Arbitrary granularities and calendars are supported, and the Unified Timestamp is hierarchically organized, allowing it to represent an unlimited array of temporal entities.demographic surveillance, standardization, temporal databases, temporal integrity, timestamp, valid time
- âŠ