6 research outputs found

    Applying Cognitive Principles of Similarity to Data Integration – The Case of SIAM

    Get PDF
    Increasingly, modern system design is concerned with the integration of legacy systems and data. Consequently, data integration is an important step in many system design projects and also a prerequisite to data warehousing, data mining, and analytics. The central step in data integration is the identification of similar elements in multiple data sources. In this paper, we describe an application of principles of similarity based in cognitive psychology, specifically the theory of Similarity as Interactive Activation and Mapping (SIAM) to the problem of database schema matching. In a field that has been dominated by a multitude of ad-hoc algorithms, cognitive principles can establish an appropriate theoretical basis. The results of this paper show initial success in matching applications and point towards future research

    A Survey of Cognitive Theories to Support Data Integration

    Get PDF
    Business intelligence applications are being increasingly used to facilitate managerial insight and maintain competitiveness.These applications rely on the availability of integrated data from multiple data sources, making database integration anincreasingly important task. A central step in the process of data integration is schema matching, the identification of similarelements in the two databases. While a number of approaches have been proposed, the majority of schema matchingtechniques are based on ad-hoc heuristics, instead of an established theoretical foundation. The absence of a theoreticalfoundation makes it difficult to explain and improve schema matching process. This research surveys current cognitivetheories of similarity and demonstrates their application to the problem of schema matching. Better integration techniqueswill benefit business intelligence applications and can thereby contribute to business value

    Survey: Models and Prototypes of Schema Matching

    Get PDF
    Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes

    Smarter Cities: Cleaning Electricity, Gas and Water Metered Consumption Data for Social and Urban Research

    Get PDF
    It is claimed that ‘Big Data’ could help cities become ‘smart’, utilise resources more efficiently, and improve inhabitants’ quality of life. Metered consumption data of electricity, gas and water are collected and stored for each household in Australia and provide a valuable source of information for researchers hoping to understand the patterns of consumption and improve resource utilisation. This work tests the assumption that these datasets are sufficiently ‘clean’ to allow interrogation and details the common errors encountered. An inspection of 1-7 years of meter readings of electricity, gas and water for ~60,000 households in Canberra, Australia as well as all datasets of the local government’s energy and water efficiency programs, found this not to be the case. Common errors found include: missing data, biases in erroneous data, errors generated by the data custodians, duplicate entries, the effect of different custodian objectives and of poor data constraints (free flowing text).This work was supported by the ACT Government Environment and Planning Directorate

    Attribute lattice: a graph-based conceptual modeling grammar for heterogeneous data

    Get PDF
    One key characteristic of big data is variety. With massive and growing amounts of data existing in independent and heterogeneous (structured and unstructured) sources, assigning consistent and interoperable data semantics, which is essential for meaningful use of data, is an increasingly important challenge. I argue, conceptual models, in contrast to their traditional roles in the Information System development, can be used to represent data semantics as perceived by the user of data. In this thesis, I use principles from philosophical ontology, human cognition (i.e., classification theory), and graph theory to offer a theory-based conceptual modeling grammar for this purpose. This grammar reflects data from users of data perspective and independent from data source schema. I formally define the concept of attribute lattice as a graph-based, schema-free conceptual modeling grammar that represents attributes of instances in the domain of interest and precedence relations among them. Each node in an attribute lattice represents an attribute - a true statement (predicate) about some instances in the domain. Each directed arc represents a precedence relation indicating that possessing one attribute implies possessing another attribute. In this thesis, based on the premise that inherent classification is a barrier that hinders semantic interoperation of heterogeneous data sources, a human cognition based conceptual modeling grammar is introduced as an effective way to resolve semantic heterogeneity. This grammar represents the precedence relationship among attributes as perceived by human user and provides a mechanism to infer classes based on the pattern of precedences. Hence, a key contribution of attribute lattice is semantic relativism – that is, the classification in this grammar relies on the pattern of precedence relationship among attributes rather than fixed classes. This modeling grammar uses the immediate and semantic neighbourhoods of an attribute to designate an attribute as either a category, a class or a property and to specify the expansion of an attribute – attributes which are semantically equal to the given attribute. The introduced conceptual modeling grammar is implemented as an artifact to store and manage attribute lattices, to graphically represent them, and integrate lattices from various heterogeneous sources. With the ever-increasing amount of unstructured data (mostly text data) from various data sources such as social media, integrating text data with other data sources has gained considerable attention. This massive amount of data, however, makes finding the data relevant to a topic of interest a new challenge. I argue that the attribute lattice provides a robust semantic foundation to address this information retrieval challenge from unstructured data sources. Hence, a topic modeling approach based on the attribute lattice is proposed for Twitter. This topic model conceptualizes topic structure of tweets related to the domain of interest and enhances information retrieval by improving the semantic interpretability of hashtags
    corecore