R quality checks for DCF data submission: Exploratory Data Analysis for Fishing Fleet economic data call.

Abstract

The JRC-IPSC under it Administrative Arrangement with DG MARE, amongst many other activities, has to call data from the Member States, give support to MS on the continuos improvement of data quality, make this data available to the Scientific, Technical and Economic Committee for Fisheries (STECF) and then curate these data to ensure it long-term usability. Amongst the calls launched by JRC-IPSC, the call for fishing fleet economic data is launched every year since 2005. Since then a policy on data quality has been implemented, however, lately, due to the existence of more data intensive processes and a progressively implementation of an open data policy and a data reusability policy, additional effort has been done to further streamline the process of assessing/improving the data quality. In this sequence, since 2013 a new tool was developed in support of the data quality assessment. This is a tool based on the generation of dynamic reports based in knitr/Sweave (R packages). This report presentes the Data Quality Report . A tool developed in R /Latex language that on the fly fetchs data from a database where data is uploaded by the MS, cleans the data, reprocess the data, produce the outputs to support the data quality analysis and, at the end, generates a pdf report where the coding, outputs and analysis are putted together. This tool has revealed to be of major efficiency - less time consuming, error free, reproducible at any time and based on a policy of transparence (code and outputs all made available together). Therefore the same methodology will be used on support of the data policy in the JRC-IPSC in the future. For that further enhancements might be sought such has the conversion of the outputs from a pdf document to an interactive web application.JRC.G.3-Maritime affair

    Similar works

    Full text

    thumbnail-image