1 research outputs found

    Data integration by means of object identification in information systems

    Get PDF
    Abstract- Data integration is an important topic in the information age. Although structural aspects are widely investigated, there is a lack of research on semantic discrep-ancies between data sources. Data integration should be able to handle input errors such as erroneous data and misspellings. Also problems like domain and data type mismatch, of missing values and du-plicated records need investigation. Object identification is essential for the task of integration, especially if keys are ab-sent or incorrect. This approach utilizes properties, which can be derived from the data sources used for identification- the derivable attributes. Two sources given, the values of the derivable attributes of pairs of records are compared and classified. A random sample of pairs is used for detecting similarities, rules or classification criteria. Different Statis-tical or Data Mining Techniques can be applied to classify pairs of records from two sources in order to link them or not. Keywords- database integration record linkage, id.entification, derivable attributes, S emant ic conflicts I
    corecore