69 research outputs found

    A New Formal Context for Symmetric Dependencies

    Get PDF
    In this paper we present a new formal context for symmetric dependencies. We study its properties and compare it with previous approaches. We also discuss how this new context may open the door to solve some open problems for symmetric dependencies.Postprint (published version

    Extending and inferring functional dependencies in schema transformation

    Full text link

    On the Discovery of Semantically Meaningful SQL Constraints from Armstrong Samples: Foundations, Implementation, and Evaluation

    No full text
    A database is said to be C-Armstrong for a finite set Σ of data dependencies in a class C if the database satisfies all data dependencies in Σ and violates all data dependencies in C that are not implied by Σ. Therefore, Armstrong databases are concise, user-friendly representations of abstract data dependencies that can be used to judge, justify, convey, and test the understanding of database design choices. Indeed, an Armstrong database satisfies exactly those data dependencies that are considered meaningful by the current design choice Σ. Structural and computational properties of Armstrong databases have been deeply investigated in Codd’s Turing Award winning relational model of data. Armstrong databases have been incorporated in approaches towards relational database design. They have also been found useful for the elicitation of requirements, the semantic sampling of existing databases, and the specification of schema mappings. This research establishes a toolbox of Armstrong databases for SQL data. This is challenging as SQL data can contain null marker occurrences in columns declared NULL, and may contain duplicate rows. Thus, the existing theory of Armstrong databases only applies to idealized instances of SQL data, that is, instances without null marker occurrences and without duplicate rows. For the thesis, two popular interpretations of null markers are considered: the no information interpretation used in SQL, and the exists but unknown interpretation by Codd. Furthermore, the study is limited to the popular class C of functional dependencies. However, the presence of duplicate rows means that the class of uniqueness constraints is no longer subsumed by the class of functional dependencies, in contrast to the relational model of data. As a first contribution a provably-correct algorithm is developed that computes Armstrong databases for an arbitrarily given finite set of uniqueness constraints and functional dependencies. This contribution is based on axiomatic, algorithmic and logical characterizations of the associated implication problem that are also established in this thesis. While the problem to decide whether a given database is Armstrong for a given set of such constraints is precisely exponential, our algorithm computes an Armstrong database with a number of rows that is at most quadratic in the number of rows of a minimum-sized Armstrong database. As a second contribution the algorithms are implemented in the form of a design tool. Users of the tool can therefore inspect Armstrong databases to analyze their current design choice Σ. Intuitively, Armstrong databases are useful for the acquisition of semantically meaningful constraints, if the users can recognize the actual meaningfulness of constraints that they incorrectly perceived as meaningless before the inspection of an Armstrong database. As a final contribution, measures are introduced that formalize the term “useful” and it is shown by some detailed experiments that Armstrong tables, as computed by the tool, are indeed useful. In summary, this research establishes a toolbox of Armstrong databases that can be applied by database designers to concisely visualize constraints on SQL data. Such support can lead to database designs that guarantee efficient data management in practice

    On Resolving Semantic Heterogeneities and Deriving Constraints in Schema Integration

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    XML documents schema design

    Get PDF
    The eXtensible Markup Language (XML) is fast emerging as the dominant standard for storing, describing and interchanging data among various systems and databases on the intemet. It offers schema such as Document Type Definition (DTD) or XML Schema Definition (XSD) for defining the syntax and structure of XML documents. To enable efficient usage of XML documents in any application in large scale electronic environment, it is necessary to avoid data redundancies and update anomalies. Redundancy and anomalies in XML documents can lead not only to higher data storage cost but also to increased costs for data transfer and data manipulation.To overcome this problem, this thesis proposes to establish a formal framework of XML document schema design. To achieve this aim, we propose a method to improve and simplify XML schema design by incorporating a conceptual model of the DTD with a theory of database normalization. A conceptual diagram, Graph-Document Type Definition (G-DTD) is proposed to describe the structure of XML documents at the schema level. For G- DTD itself, we define a structure which incorporates attributes, simple elements, complex elements, and relationship types among them. Furthermore, semantic constraints are also precisely defined in order to capture semantic meanings among the defined XML objects.In addition, to provide a guideline to a well-designed schema for XML documents, we propose a set of normal forms for G-DTD on the basis of rules proposed by Arenas and Libkin and Lv. et al. The corresponding normalization rules to transform from a G- DTD into a normal form schema are also discussed. A case study is given to illustrate the applicability of the concept. As a result, we found that the new normal forms are more concise and practical, in particular as they allow the user to find an 'optimal' structure of XML elements/attributes at the schema level. To prove that our approach is applicable for the database designer, we develop a prototype of XML document schema design using a Z formal specification language. Finally, using the same case study, this formal specification is tested to check for correctness and consistency of the specification. Thus, this gives a confidence that our prototype can be implemented successfully to generate an automatic XML schema design

    Graphical and Spreadsheet Reasoning for Sets of Functional Dependendies

    Get PDF
    Reasoning on constraint sets is a difficult task. Classical database design is based on a step-wise extension of the constraint set and on a consideration of constraint sets through generation by tools. Since the database developer must master semantics acquisition, tools and approaches are still sought that support reasoning on sets of constraints. We propose novel approaches for presentation of sets of functional dependencies based on specific graphs and spreadsheets. These approaches may be used for the elicitation of the full knowledge on validity of functional dependencies in relational schemata

    Ontological approach for database integration

    Get PDF
    Database integration is one of the research areas that have gained a lot of attention from researcher. It has the goal of representing the data from different database sources in one unified form. To reach database integration we have to face two obstacles. The first one is the distribution of data, and the second is the heterogeneity. The Web ensures addressing the distribution problem, and for the case of heterogeneity there are many approaches that can be used to solve the database integration problem, such as data warehouse and federated databases. The problem in these two approaches is the lack of semantics. Therefore, our approach exploits the Semantic Web methodology. The hybrid ontology method can be facilitated in solving the database integration problem. In this method two elements are available; the source (database) and the domain ontology, however, the local ontology is missing. In fact, to ensure the success of this method the local ontologies should be produced. Our approach obtains the semantics from the logical model of database to generate local ontology. Then, the validation and the enhancement can be acquired from the semantics obtained from the conceptual model of the database. Now, our approach can be applied in the generation phase and the validation-enrichment phase. In the generation phase in our approach, we utilise the reverse engineering techniques in order to catch the semantics hidden in the SQL language. Then, the approach reproduces the logical model of the database. Finally, our transformation system will be applied to generate an ontology. In our transformation system, all the concepts of classes, relationships and axioms will be generated. Firstly, the process of class creation contains many rules participating together to produce classes. Our unique rules succeeded in solving problems such as fragmentation and hierarchy. Also, our rules eliminate the superfluous classes of multi-valued attribute relation as well as taking care of neglected cases such as: relationships with additional attributes. The final class creation rule is for generic relation cases. The rules of the relationship between concepts are generated with eliminating the relationships between integrated concepts. Finally, there are many rules that consider the relationship and the attributes constraints which should be transformed to axioms in the ontological model. The formal rules of our approach are domain independent; also, it produces a generic ontology that is not restricted to a specific ontology language. The rules consider the gap between the database model and the ontological model. Therefore, some database constructs would not have an equivalent in the ontological model. The second phase consists of the validation and the enrichment processes. The best way to validate the transformation result is to facilitate the semantics obtained from the conceptual model of the database. In the validation phase, the domain expert captures the missing or the superfluous concepts (classes or relationships). In the enrichment phase, the generalisation method can be applied to classes that share common attributes. Also, the concepts of complex or composite attributes can be represented as classes. We implement the transformation system by a tool called SQL2OWL in order to show the correctness and the functionally of our approach. The evaluation of our system showed the success of our proposed approach. The evaluation goes through many techniques. Firstly, a comparative study is held between the results produced by our approach and the similar approaches. The second evaluation technique is the weighting score system which specify the criteria that affect the transformation system. The final evaluation technique is the score scheme. We consider the quality of the transformation system by applying the compliance measure in order to show the strength of our approach compared to the existing approaches. Finally the measures of success that our approach considered are the system scalability and the completeness
    corecore