14,389 research outputs found

    A framework for data cleaning in data warehouses

    Get PDF
    It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses

    A framework for data cleaning in data warehouses

    Get PDF
    It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses

    Declarative Data Cleaning : Language, Model, and Algorithms

    Get PDF
    Projet CARAVELThe problem of data cleaning, which consists of emoving inconsistencies and errors from original data sets, is well known in the area of decision support systems and data warehouses. However, for non-conventional applications, such as the migration of largely unstructured data into structured one, or the integration of heterogeneous scientific data sets in inter-discipl- inary fields (e.g., in environmental science), existing ETL (Extraction Transformation Loading) and data cleaning tools for writing data cleaning programs are insufficient. The main challenge with them is the design of a data flow graph that effectively generates clean data, and can perform efficiently on large sets of input data. The difficulty with them comes from (i) a lack of clear separation between the logical specification of data transformations and their physical implementation and (ii) the lack of explanation of cleaning results and user interaction facilities to tune a data cleaning program. This paper addresses these two problems and presents a language, an execution model and algorithms that enable users to express data cleaning specifications declaratively and perform the cleaning efficiently. We use as an example a set of bibliographic references used to construct the Citeseer Web site. The underlying data integration problem is to derive structured and clean textual records so that meaningful queries can be performed. Experimental results report on the assessement of the proposed framework for data cleaning

    Extracting, Transforming and Archiving Scientific Data

    Get PDF
    It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.Comment: 8 pages, Fourth Workshop on Very Large Digital Libraries, 201

    The Role of Maintenance and Facility Management in Logistics: A Literature Review

    Get PDF
    Purpose - The purpose of this paper is to provide a literature review on the different ways of carrying out Facility Management and related topics in order to uncover that there is limited research regarding the impact of Facility Management on the logistics and operational performance of warehouses. Design/methodology/approach - Four different focus areas have been identified and for each one different methodologies and streams of research have been studied. Findings - The study underlines the importance of Facility Management for the logistics operations; therefore it supports the notion that investments aiming at preserving the status of the building and service components of warehouses are crucial. Originality/value - This paper aims to suggest to Facility Management managers that they can contribute to enhance business performance by designing effective Facility Management strategie

    Assessment of irregularities in organic imports from Ukraine to the EU in 2016, notified in OFIS

    Get PDF
    The underlying study of this report set out to improve the understanding situation concerning residues found in organic food products exported from Ukraine, and to formulate guidelines for identifying and reducing risks for contamination through non-permitted substances based on the results of an in-depth analysis of those residue cases notified in the European Commission’s Organic Farming Information System (OFIS) in 2016. Not surprisingly, the combination of various factors such as (i) the additional sampling required by the new EU import guidelines, (ii) the growing number of exported organic lots from Ukraine, and (iii) the improved analysis technology, led to an increased total number of cases of irregularities notified in OFIS in comparison to previous years. Nevertheless, the number of irregularities in Ukraine in 2016, notified in OFIS, is moderate (affecting estimated < 1% of all exported consignments from Ukraine). Of the lots affected, two thirds were ultimately released as “organic” after additional investigations had been carried out by the respective export CB. Yet, if analysis results of samples taken by the CB’s prior the export, i.e. from crops during the growing season and from lots before they are released for export are included in the risk assessment, Ukraine and its neighbouring countries do need to be considered as relatively high risk countries in terms of contamination and irregularities. It is further interesting to note that the likeliness of residue findings vary a lot among different CBs. The reasons why some CB’s have a high share of residue findings whereas for others proportionally much less residues are found are unclear and should be the subject of further assessments. One assumption is that some CBs took risk-oriented samples whereas others did not. Sampling during the production process (field/leafs and dust) effectively supports organic integrity. Most CB nevertheless focus on residue free final products. The way a CB responds on detected irregularities, i.e. investigates a case and derives “lessons learnt” is very important. A majority of OFIS cases from Ukrainian exports seems to be linked to insufficient management of handling procedure during the storage processes and the transport. However, drift on the field or the intentional use of unauthorised substances are also potential sources of irregularities related to exports from Ukraine. Apart from those cases for which likely root causes have been identified, no clear explanation for discrepancies between lab results between export and import countries could be found for nearly one third of the Ukrainian OFIS cases. Further investigations should be carried out to help identify the reasons for the relatively large differences between the lab results of samples taken from the same trade lots. It is important to better understand these discrepancies in sample measurements because these may lead to significant negative economic impacts for everyone involved in the value chain, even though no rules may have been broken. Another recommendation resulting from this study is to focus more on detecting potential contaminations on the field during the period of crop cultivation. Special attention should be given here to the testing of leaf sample of crops in which contamination has been detected in the past: rapeseeds, sunflower seeds or high quality milling wheat. CB’s should have guidelines on how and when leaf samples should be best taken. Ukrainian organic operators often complain that all Ukrainian operators are put in the same basket and treated as high-risk suppliers. In response to the stricter regulations imposed on them, operators and experts participating in the International Conference “Improving Integrity of Organic Supply Chains” in Odesa 2017 called for an amendment of the inspection policy. Instead of labelling entire countries as high-risk, focus should rather be placed on risky value chains. Supply chains considered high-risk should be relieved from extra measures, once they have demonstrated consistent compliance
    • …
    corecore