6 research outputs found
Applying Cognitive Principles of Similarity to Data Integration â The Case of SIAM
Increasingly, modern system design is concerned with the integration of legacy systems and data. Consequently, data integration is an important step in many system design projects and also a prerequisite to data warehousing, data mining, and analytics. The central step in data integration is the identification of similar elements in multiple data sources. In this paper, we describe an application of principles of similarity based in cognitive psychology, specifically the theory of Similarity as Interactive Activation and Mapping (SIAM) to the problem of database schema matching. In a field that has been dominated by a multitude of ad-hoc algorithms, cognitive principles can establish an appropriate theoretical basis. The results of this paper show initial success in matching applications and point towards future research
A Survey of Cognitive Theories to Support Data Integration
Business intelligence applications are being increasingly used to facilitate managerial insight and maintain competitiveness.These applications rely on the availability of integrated data from multiple data sources, making database integration anincreasingly important task. A central step in the process of data integration is schema matching, the identification of similarelements in the two databases. While a number of approaches have been proposed, the majority of schema matchingtechniques are based on ad-hoc heuristics, instead of an established theoretical foundation. The absence of a theoreticalfoundation makes it difficult to explain and improve schema matching process. This research surveys current cognitivetheories of similarity and demonstrates their application to the problem of schema matching. Better integration techniqueswill benefit business intelligence applications and can thereby contribute to business value
Survey: Models and Prototypes of Schema Matching
Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes
Smarter Cities: Cleaning Electricity, Gas and Water Metered Consumption Data for Social and Urban Research
It is claimed that âBig Dataâ could help cities become âsmartâ, utilise resources more efficiently, and improve inhabitantsâ quality of life. Metered consumption data of electricity, gas and water are collected and stored for each household in Australia and provide a valuable source of information for researchers hoping to understand the patterns of consumption and improve resource utilisation. This work tests the assumption that these datasets are sufficiently âcleanâ to allow interrogation and details the common errors encountered. An inspection of 1-7 years of meter readings of electricity, gas and water for ~60,000 households in Canberra, Australia as well as all datasets of the local governmentâs energy and water efficiency programs, found this not to be the case. Common errors found include: missing data, biases in erroneous data, errors generated by the data custodians, duplicate entries, the effect of different custodian objectives and of poor data constraints (free flowing text).This work was supported by the ACT Government Environment and Planning
Directorate
Attribute lattice: a graph-based conceptual modeling grammar for heterogeneous data
One key characteristic of big data is variety. With massive and growing amounts of data
existing in independent and heterogeneous (structured and unstructured) sources, assigning consistent
and interoperable data semantics, which is essential for meaningful use of data, is an increasingly
important challenge. I argue, conceptual models, in contrast to their traditional roles in
the Information System development, can be used to represent data semantics as perceived by the
user of data. In this thesis, I use principles from philosophical ontology, human cognition (i.e.,
classification theory), and graph theory to offer a theory-based conceptual modeling grammar for
this purpose. This grammar reflects data from users of data perspective and independent from data
source schema. I formally define the concept of attribute lattice as a graph-based, schema-free
conceptual modeling grammar that represents attributes of instances in the domain of interest and
precedence relations among them. Each node in an attribute lattice represents an attribute - a true
statement (predicate) about some instances in the domain. Each directed arc represents a precedence
relation indicating that possessing one attribute implies possessing another attribute.
In this thesis, based on the premise that inherent classification is a barrier that hinders semantic
interoperation of heterogeneous data sources, a human cognition based conceptual modeling
grammar is introduced as an effective way to resolve semantic heterogeneity. This grammar
represents the precedence relationship among attributes as perceived by human user and provides
a mechanism to infer classes based on the pattern of precedences. Hence, a key contribution of
attribute lattice is semantic relativism â that is, the classification in this grammar relies on the pattern of precedence relationship among attributes rather than fixed classes. This modeling grammar
uses the immediate and semantic neighbourhoods of an attribute to designate an attribute as
either a category, a class or a property and to specify the expansion of an attribute â attributes
which are semantically equal to the given attribute. The introduced conceptual modeling grammar
is implemented as an artifact to store and manage attribute lattices, to graphically represent them,
and integrate lattices from various heterogeneous sources.
With the ever-increasing amount of unstructured data (mostly text data) from various data
sources such as social media, integrating text data with other data sources has gained considerable
attention. This massive amount of data, however, makes finding the data relevant to a topic of
interest a new challenge. I argue that the attribute lattice provides a robust semantic foundation to
address this information retrieval challenge from unstructured data sources. Hence, a topic modeling
approach based on the attribute lattice is proposed for Twitter. This topic model conceptualizes
topic structure of tweets related to the domain of interest and enhances information retrieval
by improving the semantic interpretability of hashtags