44 research outputs found
Exemplifying Business Opportunities for Improving Data Quality From Corporate Household Research
Corporate household (CHH) refers to the organizational information about the structure within the corporation and
a variety of inter-organizational relationships. Knowledge derived from this data is becoming increasingly important
for improving data quality in applications, such as Customer Relationship Management (CRM), Enterprise
Resource Planning (ERP), Supply Chain Management (SCM), risk management, and sales and market promotion.
Extending the concepts from our previous CHH research, we exemplify in this paper the importance of improved
corporate household knowledge and processing in various business application areas. Additionally, we provide
examples of CHH business rules that are often implicit and fragmented - understood and practiced by different
domain experts across functional areas of the firm. This paper is intended to form a foundation for further research
to systematically investigate, capture, and build a body of corporate householding knowledge across diverse
business applications
Data Quality Management in Corporate Practice
The 21st century is characterized by a rising quantity and importance of Data and Infor-mation. Companies utilize these in order to gain and maintain competitive advantages. Therefore, the Data and Information is required both in high quantity as well as quality. But while the amount of Data collected is steadily increasing, this does not necessarily mean the same is true for Data Quality. In order to assure high Data Quality, the concept of Data Quality Management (DQM) has been established, incorporating such elements as the assessment of Data Quality as well as its improvement. In order to discuss the issue of Data Quality Management, this paper pursues the following goals:
(1) Systematic literature search for publications regarding Data Quality Management (Scientific contributions, Practice reports etc.)
(2) Provision of a structured overview of the identified references and the research mate-rial
(3) Analysis and evaluation of the scientific contributions with regards to methodology and theoretical foundation
(4) Current expression of DQM in practice, differentiated by organization type and in-dustry (based upon the entire research material) as well as assessment of the situation (how well are the design recommendations based upon research results)
(5) Summary of unresolved issues and challenges, based upon the research materia
Addressing the Challenges of Aggregational and Temporal Ontological Heterogeneity
In this paper, we first identify semantic heterogeneities that, when not resolved, often cause serious data quality problems. We discuss the especially challenging problems of temporal and aggregational ontological heterogeneity, which concerns how complex entities and their relationships are aggregated and reinterpreted over time. Then we illustrate how the COntext INterchange (COIN) technology can be used to capture data semantics and reconcile semantic heterogeneities in a scalable manner, thereby improving data quality.Singapore-MIT Alliance (SMA
Reasoning about Temporal Context using Ontology and Abductive Constraint Logic Programming
The underlying assumptions for interpreting the meaning of data often change over time, which further complicates the problem of semantic heterogeneities among autonomous data sources. As an extension to the COntext INterchange (COIN) framework, this paper introduces the notion of temporal context as a formalization of the problem. We represent temporal context as a multi-valued method in F-Logic; however, only one value is valid at any point in time, the determination of which is constrained by temporal relations. This representation is then mapped to an abductive constraint logic programming framework with temporal relations being treated as constraints. A mediation engine that implements the framework automatically detects and reconciles semantic differences at different times. We articulate that this extended COIN framework is suitable for reasoning on the Semantic Web.Singapore-MIT Alliance (SMA
Improving National and Homeland Security through a proposed Laboratory for Information Globalization and Harmonization Technologies (LIGHT)
A recent National Research Council study found that: "Although there are many private and public databases that
contain information potentially relevant to counter terrorism programs, they lack the necessary context definitions
(i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and
timely information" [NRC02, p.304, emphasis added] That sentence succinctly describes the objectives of this
project. Improved access and use of information are essential to better identify and anticipate threats, protect
against and respond to threats, and enhance national and homeland security (NHS), as well as other national
priority areas, such as Economic Prosperity and a Vibrant Civil Society (ECS) and Advances in Science and
Engineering (ASE). This project focuses on the creation and contributions of a Laboratory for Information
Globalization and Harmonization Technologies (LIGHT) with two interrelated goals:
(1) Theory and Technologies: To research, design, develop, test, and implement theory and technologies for
improving the reliability, quality, and responsiveness of automated mechanisms for reasoning and resolving semantic
differences that hinder the rapid and effective integration (int) of systems and data (dmc) across multiple
autonomous sources, and the use of that information by public and private agencies involved in national and
homeland security and the other national priority areas involving complex and interdependent social systems (soc).
This work builds on our research on the COntext INterchange (COIN) project, which focused on the integration
of diverse distributed heterogeneous information sources using ontologies, databases, context mediation algorithms,
and wrapper technologies to overcome information representational conflicts. The COIN approach makes it
substantially easier and more transparent for individual receivers (e.g., applications, users) to access and exploit
distributed sources. Receivers specify their desired context to reduce ambiguities in the interpretation of information
coming from heterogeneous sources. This approach significantly reduces the overhead involved in the integration of
multiple sources, improves data quality, increases the speed of integration, and simplifies maintenance in an
environment of changing source and receiver context - which will lead to an effective and novel distributed
information grid infrastructure. This research also builds on our Global System for Sustainable Development
(GSSD), an Internet platform for information generation, provision, and integration of multiple domains, regions,
languages, and epistemologies relevant to international relations and national security.
(2) National Priority Studies: To experiment with and test the developed theory and technologies on practical
problems of data integration in national priority areas. Particular focus will be on national and homeland security,
including data sources about conflict and war, modes of instability and threat, international and regional
demographic, economic, and military statistics, money flows, and contextualizing terrorism defense and response.
Although LIGHT will leverage the results of our successful prior research projects, this will be the first research
effort to simultaneously and effectively address ontological and temporal information conflicts as well as
dramatically enhance information quality. Addressing problems of national priorities in such rapidly changing
complex environments requires extraction of observations from disparate sources, using different interpretations, at
different points in times, for different purposes, with different biases, and for a wide range of different uses and
users. This research will focus on integrating information both over individual domains and across multiple domains.
Another innovation is the concept and implementation of Collaborative Domain Spaces (CDS), within which
applications in a common domain can share, analyze, modify, and develop information. Applications also can span
multiple domains via Linked CDSs. The PIs have considerable experience with these research areas and the
organization and management of such large scale international and diverse research projects.
The PIs come from three different Schools at MIT: Management, Engineering, and Humanities, Arts & Social
Sciences. The faculty and graduate students come from about a dozen nationalities and diverse ethnic, racial, and
religious backgrounds. The currently identified external collaborators come from over 20 different organizations
and many different countries, industrial as well as developing. Specific efforts are proposed to engage even more
women, underrepresented minorities, and persons with disabilities.
The anticipated results apply to any complex domain that relies on heterogeneous distributed data to address and
resolve compelling problems. This initiative is supported by international collaborators from (a) scientific and
research institutions, (b) business and industry, and (c) national and international agencies. Research products
include: a System for Harmonized Information Processing (SHIP), a software platform, and diverse applications in
research and education which are anticipated to significantly impact the way complex organizations, and society in
general, understand and manage critical challenges in NHS, ECS, and ASE
Improving National and Homeland Security through a proposed Laboratory for nformation Globalization and Harmonization Technologies (LIGHT)
A recent National Research Council study found that: "Although there are many private and public databases that
contain information potentially relevant to counter terrorism programs, they lack the necessary context definitions
(i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and
timely information" [NRC02, p.304, emphasis added] That sentence succinctly describes the objectives of this
project. Improved access and use of information are essential to better identify and anticipate threats, protect
against and respond to threats, and enhance national and homeland security (NHS), as well as other national
priority areas, such as Economic Prosperity and a Vibrant Civil Society (ECS) and Advances in Science and
Engineering (ASE). This project focuses on the creation and contributions of a Laboratory for Information
Globalization and Harmonization Technologies (LIGHT) with two interrelated goals:
(1) Theory and Technologies: To research, design, develop, test, and implement theory and technologies for
improving the reliability, quality, and responsiveness of automated mechanisms for reasoning and resolving semantic
differences that hinder the rapid and effective integration (int) of systems and data (dmc) across multiple
autonomous sources, and the use of that information by public and private agencies involved in national and
homeland security and the other national priority areas involving complex and interdependent social systems (soc).
This work builds on our research on the COntext INterchange (COIN) project, which focused on the integration of
diverse distributed heterogeneous information sources using ontologies, databases, context mediation algorithms,
and wrapper technologies to overcome information representational conflicts. The COIN approach makes it
substantially easier and more transparent for individual receivers (e.g., applications, users) to access and exploit
distributed sources. Receivers specify their desired context to reduce ambiguities in the interpretation of information
coming from heterogeneous sources. This approach significantly reduces the overhead involved in the integration of
multiple sources, improves data quality, increases the speed of integration, and simplifies maintenance in an
environment of changing source and receiver context - which will lead to an effective and novel distributed
information grid infrastructure. This research also builds on our Global System for Sustainable Development
(GSSD), an Internet platform for information generation, provision, and integration of multiple domains, regions,
languages, and epistemologies relevant to international relations and national security.
(2) National Priority Studies: To experiment with and test the developed theory and technologies on practical
problems of data integration in national priority areas. Particular focus will be on national and homeland security,
including data sources about conflict and war, modes of instability and threat, international and regional
demographic, economic, and military statistics, money flows, and contextualizing terrorism defense and response.
Although LIGHT will leverage the results of our successful prior research projects, this will be the first research
effort to simultaneously and effectively address ontological and temporal information conflicts as well as
dramatically enhance information quality. Addressing problems of national priorities in such rapidly changing
complex environments requires extraction of observations from disparate sources, using different interpretations, at
different points in times, for different purposes, with different biases, and for a wide range of different uses and
users. This research will focus on integrating information both over individual domains and across multiple domains.
Another innovation is the concept and implementation of Collaborative Domain Spaces (CDS), within which
applications in a common domain can share, analyze, modify, and develop information. Applications also can span
multiple domains via Linked CDSs. The PIs have considerable experience with these research areas and the
organization and management of such large scale international and diverse research projects.
The PIs come from three different Schools at MIT: Management, Engineering, and Humanities, Arts & Social
Sciences. The faculty and graduate students come from about a dozen nationalities and diverse ethnic, racial, and
religious backgrounds. The currently identified external collaborators come from over 20 different organizations and
many different countries, industrial as well as developing. Specific efforts are proposed to engage even more
women, underrepresented minorities, and persons with disabilities.
The anticipated results apply to any complex domain that relies on heterogeneous distributed data to address and
resolve compelling problems. This initiative is supported by international collaborators from (a) scientific and
research institutions, (b) business and industry, and (c) national and international agencies. Research products
include: a System for Harmonized Information Processing (SHIP), a software platform, and diverse applications in
research and education which are anticipated to significantly impact the way complex organizations, and society in
general, understand and manage critical challenges in NHS, ECS, and ASE
A Framework for Classification of the Data and Information Quality Literature and Preliminart Results (1996-2007)
The value of management decisions, the security of our nation, and the very foundations of our business integrity are all dependent on the quality of data and information. However, the quality of the data and information is dependent on how that data or information will be used. This paper proposes a theory of data quality based on the five principles defined by J. M. Juran for product and service quality and extends Wang et al’s 1995 framework for data quality research. It then examines the data and information quality literature from journals within the context of this framework
Understanding data quality issues in dynamic organisational environments – a literature review
Technology has been the catalyst that has facilitated an explosion of organisational data in terms of its velocity, variety, and volume, resulting in a greater depth and breadth of potentially valuable information, previously unutilised. The variety of data accessible to organisations extends beyond traditional structured data to now encompass previously unobtainable and difficult to analyse unstructured data. In addition to exploiting data, organisations are now facing an even greater challenge of assessing data quality and identifying the impacts of lack of quality. The aim of this research is to contribute to data quality literature, focusing on improving a current understanding of business-related Data Quality (DQ) issues facing organisations. This review builds on existing Information Systems literature, and proposes further research in this area. Our findings confirm that the current literature lags in recognising new types of data and imminent DQ impacts facing organisations in today’s dynamic environment of the so-called “Big Data”. Insights clearly identify the need for further research on DQ, in particular in relation to unstructured data. It also raises questions regarding new DQ impacts and implications for organisations, in their quest to leverage the variety of available data types to provide richer insights.<br /
Assessing Quality of Unstructured Data – Insights From a Global Imaging Company
The main objective of this research to understand if previous Data Quality frameworks are still applicable in today’s organisational environment characterised by a wide variety of data types, including the unstructured data. The paper describes a pilot study conducted in a global imaging company with the researchers adopting and re-examining a previously developed data quality framework, used in a number of different research studies for more than a decade. The study focuses on two research questions: Are the existing data quality frameworks developed for highly structured data, still applicable to today’s organisational environment? Do users’ perceptions of data quality change depending on data type? The paper reports on the main findings and offers some suggestions for future research
Understanding Data Quality Issues in Dynamic Organisational Environments : A Literature Review
Technology has been the catalyst that has facilitated an explosion of organisational data in terms of its velocity, variety, and volume, resulting in a greater depth and breadth of potentially valuable information, previously unutilised. The variety of data accessible to organisations extends beyond traditional structured data to now encompass previously unobtainable and difficult to analyse unstructured data. In addition to exploiting data, organisations are now facing an even greater challenge of assessing data quality and identifying the impacts of lack of quality. The aim of this research is to contribute to data quality literature, focusing on improving a current understanding of business-related Data Quality (DQ) issues facing organisations. This review builds on existing Information Systems literature, and proposes further research in this area. Our findings confirm that the current literature lags in recognising new types of data and imminent DQ impacts facing organisations in today’s dynamic environment of the so-called “Big Data”. Insights clearly identify the need for further research on DQ, in particular in relation to unstructured data. It also raises questions regarding new DQ impacts and implications for organisations, in their quest to leverage the variety of available data types to provide richer insights