11 research outputs found

    Considering Currency in Decision Trees in the Context of Big Data

    Get PDF
    In the current age of big data, decision trees are one of the most commonly applied data mining methods. However, for reliable results they require up-to-date input data, which is not always given in reality. We present a two-phase approach based on probability theory for considering currency of stored data in decision trees. Our approach is efficient and thus suitable for big data applications. Moreover, it is independent of the particular decision tree classifier. Finally, it is context-specific since the decision tree structure and supplemental data are taken into account. We demonstrate the benefits of the novel approach by applying it to three datasets. The results show a substantial increase in the classification success rate as opposed to not considering currency. Thus, applying our approach prevents wrong classification and consequently wrong decisions

    Assessing Data Quality - A Probability-based Metric for Semantic Consistency

    Get PDF
    We present a probability-based metric for semantic consistency using a set of uncertain rules. As opposed to existing metrics for semantic consistency, our metric allows to consider rules that are expected to be fulfilled with specific probabilities. The resulting metric values represent the probability that the assessed dataset is free of internal contradictions with regard to the uncertain rules and thus have a clear interpretation. The theoretical basis for determining the metric values are statistical tests and the concept of the p-value, allowing the interpretation of the metric value as a probability. We demonstrate the practical applicability and effectiveness of the metric in a real-world setting by analyzing a customer dataset of an insurance company. Here, the metric was applied to identify semantic consistency problems in the data and to support decision-making, for instance, when offering individual products to customers

    Data Quality Management: Trade-offs in Data Characteristics to Maintain Data Quality

    Get PDF
    We are living in an age of information in which organizations are crumbling under the pressure of exponentially growing data. Increased data quality ensures better decision making, thereby enabling companies to stay competitive in the market. To improve data quality, it is imperative to identify all the characteristics that describe data. And, building on one characteristic results in compromising another, creating a trade-off. There are many well established and interesting theories regarding data quality and data characteristics. However, we found that there is a lack of research and literature regarding how trade-offs are handled between the different types of data that is stored by an organization. To understand how organisations deal with trade-offs, we chose a framework formulated by Eppler, where various data characteristics trade-offs are discussed. After a pre-study with experts in this field, we narrowed it down to three main data characteristic trade-offs and these were further analysed through interviews. Based on the interviews conducted and the literature review, we could prioritize data types under different data characteristics. This research gives insight to how data characteristics trade-offs should be accomplished in organizations

    Requirements for Data Quality Metrics

    Get PDF
    Data quality and especially the assessment of data quality have been intensively discussed in research and practice alike. To support an economically oriented management of data quality and decision making under uncertainty, it is essential to assess the data quality level by means of well-founded metrics. However, if not adequately defined, these metrics can lead to wrong decisions and economic losses. Therefore, based on a decision-oriented framework, we present a set of five requirements for data quality metrics. These requirements are relevant for a metric that aims to support an economically oriented management of data quality and decision making under uncertainty. We further demonstrate the applicability and efficacy of these requirements by evaluating five data quality metrics for different data quality dimensions. Moreover, we discuss practical implications when applying the presented requirements

    Concepts and Methods from Artificial Intelligence in Modern Information Systems – Contributions to Data-driven Decision-making and Business Processes

    Get PDF
    Today, organizations are facing a variety of challenging, technology-driven developments, three of the most notable ones being the surge in uncertain data, the emergence of unstructured data and a complex, dynamically changing environment. These developments require organizations to transform in order to stay competitive. Artificial Intelligence with its fields decision-making under uncertainty, natural language processing and planning offers valuable concepts and methods to address the developments. The dissertation at hand utilizes and furthers these contributions in three focal points to address research gaps in existing literature and to provide concrete concepts and methods for the support of organizations in the transformation and improvement of data-driven decision-making, business processes and business process management. In particular, the focal points are the assessment of data quality, the analysis of textual data and the automated planning of process models. In regard to data quality assessment, probability-based approaches for measuring consistency and identifying duplicates as well as requirements for data quality metrics are suggested. With respect to analysis of textual data, the dissertation proposes a topic modeling procedure to gain knowledge from CVs as well as a model based on sentiment analysis to explain ratings from customer reviews. Regarding automated planning of process models, concepts and algorithms for an automated construction of parallelizations in process models, an automated adaptation of process models and an automated construction of multi-actor process models are provided

    Information governance in service-oriented business networking

    Get PDF

    Metadata quality issues in learning repositories

    Get PDF
    Metadata lies at the heart of every digital repository project in the sense that it defines and drives the description of digital content stored in the repositories. Metadata allows content to be successfully stored, managed and retrieved but also preserved in the long-term. Despite the enormous importance of metadata in digital repositories, one that is widely recognized, studies indicate that what is defined as metadata quality, is relatively low in most cases of digital repositories. Metadata quality is loosely defined as "fitness for purpose" meaning that low quality of metadata means that metadata cannot fulfill its purpose which is to allow for the successful storage, management and retrieval of resources. In practice, low metadata quality leads to ineffective searches for content, ones that recall the wrong resources or even worse, no resources which makes them invisible to the intended user, that is the "client" of each digital repository. The present dissertation approaches this problem by proposing a comprehensive metadata quality assurance method, namely the Metadata Quality Assurance Certification Process (MQACP). The basic idea of this dissertation is to propose a set of methods that can be deployed throughout the lifecycle of a repository to ensure that metadata generated from content providers are of high quality. These methods have to be straightforward, simple to apply with measurable results. They also have to be adaptable with minimum effort so that they can be used in different contexts easily. This set of methods was described analytically, taking into account the actors needed to apply them, describing the tools needed and defining the anticipated outcomes. In order to test our proposal, we applied it on a Learning Federation of repositories, from day 1 of its existence until it reached its maturity and regular operation. We supported the metadata creation process throughout the different phases of the repositories involved by setting up specific experiments using the methods and tools of the MQACP. Throughout each phase, we measured the resulting metadata quality to certify that the anticipated improvement in metadata quality actually took place. Lastly, through these different phases, the cost of the MQACP application was measured to provide a comparison basis for future applications. Based on the success of this first application, we decided to validate the MQACP approach by applying it on another two cases of a Cultural and a Research Federation of repositories. This would allow us to prove the transferability of the approach to other cases the present some similarities with the initial one but mainly significant differences. The results showed that the MQACP was successfully adapted to the new contexts, with minimum adaptations needed, with similar results produced and also with comparable costs. In addition, looking closer at the common experiments carried out in each phase of each use case, we were able to identify interesting patterns in the behavior of content providers that can be further researched. The dissertation is completed with a set of future research directions that came out of the cases examined. These research directions can be explored in order to support the next version of the MQACP in terms of the methods deployed, the tools used to assess metadata quality as well as the cost analysis of the MQACP methods

    Metadata quality issues in learning repositories

    Get PDF
    Metadata lies at the heart of every digital repository project in the sense that it defines and drives the description of digital content stored in the repositories. Metadata allows content to be successfully stored, managed and retrieved but also preserved in the long-term. Despite the enormous importance of metadata in digital repositories, one that is widely recognized, studies indicate that what is defined as metadata quality, is relatively low in most cases of digital repositories. Metadata quality is loosely defined as "fitness for purpose" meaning that low quality of metadata means that metadata cannot fulfill its purpose which is to allow for the successful storage, management and retrieval of resources. In practice, low metadata quality leads to ineffective searches for content, ones that recall the wrong resources or even worse, no resources which makes them invisible to the intended user, that is the "client" of each digital repository. The present dissertation approaches this problem by proposing a comprehensive metadata quality assurance method, namely the Metadata Quality Assurance Certification Process (MQACP). The basic idea of this dissertation is to propose a set of methods that can be deployed throughout the lifecycle of a repository to ensure that metadata generated from content providers are of high quality. These methods have to be straightforward, simple to apply with measurable results. They also have to be adaptable with minimum effort so that they can be used in different contexts easily. This set of methods was described analytically, taking into account the actors needed to apply them, describing the tools needed and defining the anticipated outcomes. In order to test our proposal, we applied it on a Learning Federation of repositories, from day 1 of its existence until it reached its maturity and regular operation. We supported the metadata creation process throughout the different phases of the repositories involved by setting up specific experiments using the methods and tools of the MQACP. Throughout each phase, we measured the resulting metadata quality to certify that the anticipated improvement in metadata quality actually took place. Lastly, through these different phases, the cost of the MQACP application was measured to provide a comparison basis for future applications. Based on the success of this first application, we decided to validate the MQACP approach by applying it on another two cases of a Cultural and a Research Federation of repositories. This would allow us to prove the transferability of the approach to other cases the present some similarities with the initial one but mainly significant differences. The results showed that the MQACP was successfully adapted to the new contexts, with minimum adaptations needed, with similar results produced and also with comparable costs. In addition, looking closer at the common experiments carried out in each phase of each use case, we were able to identify interesting patterns in the behavior of content providers that can be further researched. The dissertation is completed with a set of future research directions that came out of the cases examined. These research directions can be explored in order to support the next version of the MQACP in terms of the methods deployed, the tools used to assess metadata quality as well as the cost analysis of the MQACP methods
    corecore