21 research outputs found
Complex adaptive systems based data integration : theory and applications
Data Definition Languages (DDLs) have been created and used to represent data in programming languages and in database dictionaries. This representation includes descriptions in the form of data fields and relations in the form of a hierarchy, with the common exception of relational databases where relations are flat. Network computing created an environment that enables relatively easy and inexpensive exchange of data. What followed was the creation of new DDLs claiming better support for automatic data integration. It is uncertain from the literature if any real progress has been made toward achieving an ideal state or limit condition of automatic data integration. This research asserts that difficulties in accomplishing integration are indicative of socio-cultural systems in general and are caused by some measurable attributes common in DDLs. This research’s main contributions are: (1) a theory of data integration requirements to fully support automatic data integration from autonomous heterogeneous data sources; (2) the identification of measurable related abstract attributes (Variety, Tension, and Entropy); (3) the development of tools to measure them. The research uses a multi-theoretic lens to define and articulate these attributes and their measurements. The proposed theory is founded on the Law of Requisite Variety, Information Theory, Complex Adaptive Systems (CAS) theory, Sowa’s Meaning Preservation framework and Zipf distributions of words and meanings. Using the theory, the attributes, and their measures, this research proposes a framework for objectively evaluating the suitability of any data definition language with respect to degrees of automatic data integration.
This research uses thirteen data structures constructed with various DDLs from the 1960\u27s to date. No DDL examined (and therefore no DDL similar to those examined) is designed to satisfy the law of requisite variety. No DDL examined is designed to support CAS evolutionary processes that could result in fully automated integration of heterogeneous data sources. There is no significant difference in measures of Variety, Tension, and Entropy among DDLs investigated in this research. A direction to overcome the common limitations discovered in this research is suggested and tested by proposing GlossoMote, a theoretical mathematically sound description language that satisfies the data integration theory requirements. The DDL, named GlossoMote, is not merely a new syntax, it is a drastic departure from existing DDL constructs. The feasibility of the approach is demonstrated with a small scale experiment and evaluated using the proposed assessment framework and other means. The promising results require additional research to evaluate GlossoMote’s approach commercial use potential
CAFF CBMP Report No. 19 - Circumpolar Biodiversity Marine Monitoring Plan - background paper
CAFF CBMP Report No. 19 - Circumpolar Biodiversity Marine Monitoring Plan - background pape
From Data to Knowledge in Secondary Health Care Databases
The advent of big data in health care is a topic receiving increasing
attention worldwide. In the UK, over the last decade, the National
Health Service (NHS) programme for Information Technology
has boosted big data by introducing electronic infrastructures in hospitals
and GP practices across the country. This ever growing amount of
data promises to expand our understanding of the services, processes
and research. Potential bene�ts include reducing costs, optimisation
of services, knowledge discovery, and patient-centred predictive modelling.
This thesis will explore the above by studying over ten years
worth of electronic data and systems in a hospital treating over 750
thousand patients a year.
The hospital's information systems store routinely collected data, used
primarily by health practitioners to support and improve patient care.
This raw data is recorded on several di�erent systems but rarely linked
or analysed. This thesis explores the secondary uses of such data by
undertaking two case studies, one on prostate cancer and another on
stroke. The journey from data to knowledge is made in each of the
studies by traversing critical steps: data retrieval, linkage, integration,
preparation, mining and analysis. Throughout, novel methods
and computational techniques are introduced and the value of routinely
collected data is assessed. In particular, this thesis discusses
in detail the methodological aspects of developing clinical data warehouses
from routine heterogeneous data and it introduces methods to
model, visualise and analyse the journeys that patients take through
care. This work has provided lessons in hospital IT provision, integration,
visualisation and analytics of complex electronic patient records
and databases and has enabled the use of raw routine data for management
decision making and clinical research in both case studies
Interim research assessment 2003-2005 - Computer Science
This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities