28,213 research outputs found

    Schema Normalization for Improving Schema Matching

    Get PDF
    Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the \hidden meaning" associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a \u201cmeaning" to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations.In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy

    Schema Label Normalization for Improving Schema Matching

    Get PDF
    Schema matching is the problem of finding relationships among concepts across heterogeneous data sources that are heterogeneous in format and in structure. Starting from the \u201chidden meaning\u201d associated with schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a \u201cmeaning\u201d to schema labels.However, the performance of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns, abbreviations, and acronyms. We address this problem by proposing a method to perform schema label normalization which increases the number of comparable labels. The method semi-automatically expands abbreviations/acronyms and annotates compound nouns, with minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching results

    Automatic Normalization and Annotation for Discovering Semantic Mappings

    Get PDF
    Schema matching is the problem of finding relationships among concepts across heterogeneous data sources (heterogeneous in format and in structure). Starting from the “hidden meaning” associated to schema labels (i.e. class/attribute names) it is possible to discover relationships among the elements of different schemata. Lexical annotation (i.e. annotation w.r.t. a thesaurus/lexical resource) helps in associating a “meaning” to schema labels. However, accuracy of semi-automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and word abbreviations. In this work, we address this problem by proposing a method to perform schema labels normalization which increases the number of comparable labels. Unlike other solutions, the method semi-automatically expands abbreviations and annotates compound terms, without a minimal manual effort. We empirically prove that our normalization method helps in the identification of similarities among schema elements of different data sources, thus improving schema matching accuracy

    XML document design via GN-DTD

    Get PDF
    Designing a well-structured XML document is important for the sake of readability and maintainability. More importantly, this will avoid data redundancies and update anomalies when maintaining a large quantity of XML based documents. In this paper, we propose a method to improve XML structural design by adopting graphical notations for Document Type Definitions (GN-DTD), which is used to describe the structure of an XML document at the schema level. Multiples levels of normal forms for GN-DTD are proposed on the basis of conceptual model approaches and theories of normalization. The normalization rules are applied to transform a poorly designed XML document into a well-designed based on normalized GN-DTD, which is illustrated through examples

    A comparative analysis of data redundancy and execution time between relational and object oriented schema table

    Get PDF
    The design of database is one of the important parts in building software, because database is the data storage inside the system. There are some techniques that allow the programmer to improve design of the database. One of the most popular techniques being used for database is the relational technique, which content entity relationship diagram and normalization. The relational technique is easy to use and useful for reducing data redundancy because the normalization technique solves the data redundancy by applying normalization normal forms on the schema tables. The second technique is the object oriented technique, which content class diagram and generate schema table. An advantage of object oriented technique is its closeness to programming languages like C++ or C#. This project is starting with applying relational technique and object oriented technique to define which technique uses less data redundancy during design database. Based on experimental results for total data redundancy in HMS case study was 336 for relational technique and 364 for object oriented technique as well as, course database case study was 186 for relational technique and 204 for object oriented technique. Also, this project is focus on query execution time between relational databases and object oriented database by using user friendly window. The experimental result for query execution time in HMS case study was 107.25 milliseconds for RDBMS and 80.5 milliseconds for OODBMS. In course database case study was 46.75 milliseconds for RDBMS and 31.75 milliseconds for OODBMS. However, the comparative analysis in this project is explaining the result of comparison between relational and object oriented techniques specifically with data redundancy and query execution time

    Inductive-data-type Systems

    Get PDF
    In a previous work ("Abstract Data Type Systems", TCS 173(2), 1997), the last two authors presented a combined language made of a (strongly normalizing) algebraic rewrite system and a typed lambda-calculus enriched by pattern-matching definitions following a certain format, called the "General Schema", which generalizes the usual recursor definitions for natural numbers and similar "basic inductive types". This combined language was shown to be strongly normalizing. The purpose of this paper is to reformulate and extend the General Schema in order to make it easily extensible, to capture a more general class of inductive types, called "strictly positive", and to ease the strong normalization proof of the resulting system. This result provides a computation model for the combination of an algebraic specification language based on abstract data types and of a strongly typed functional language with strictly positive inductive types.Comment: Theoretical Computer Science (2002

    Relational Database schema normalization

    Get PDF
    U ovom radu opisani su temeljni pojmovi o relacijskom modelu podataka. Opisane su normalne forme: 1NF, 2NF, 3NF, BCNF, 4NF, 5NF i 6NF. Najveća pažnja je dana 3NF koja je od praktične važnosti te je obrađen algoritam 3NF. Ovaj algoritam je važan jer omogućuje očuvanje zavisnosti i informacija unutar relacijske sheme (R, F) koja se transformira u shemu relacijske baze podataka koja je unutar 3NF. Navedeni su razlozi motivacije za normalizacijom podataka kao i za odustajanjem od normalizacije. Opisani su koraci algoritma sinteze 3NF te primjena algoritma na pomno odabranom praktičnom primjeru kao i SQL implementacija istoga
    • …