research

Screening tools for data quality and outlier detection applied to the Airbase ambient air pollution database

Abstract

In order to provide scientifically sound information for regulatory purposes and environmental impact assessment, long term meso- to large-scale datasets of ambient air quality provide an indispensible means for model calibration, evaluation and validation. However, the collection of high quality datasets with suitable spatial coverage for air pollution management and decision support poses many challenges. It is thus critical to establish expedient tools for the efficient assessment and data quality control of air pollution measurements in large scale national and international monitoring networks. The European Environmental Agency collects, in the Air Quality Database named AirBase, measurements of ambient air pollution at more than 6000 monitoring stations from over 30 countries. The quality of these data depends on the chosen method of measurements and QA/QC procedures applied by each country. We present a methodology to automatically screen the AirBase records for internal consistency and to detect spatio-temporal outliers nested in the data. We implemented a spatial-set outlier detection method, which considers both attribute values and spatial relationships. Specifically, we adapted the “Smooth Spatial Attribute method” that was developed for the identification of outliers in traffic sensors. The method relies on the definition of a neighbourhood for each air pollutant measurement, corresponding to a spatio-temporal domain limited in time (+/- 1 day) and distance (+/- 1 degree) around location x. It is assumed that within a given spatio-temporal domain in which the attribute values of neighbours have a relationship due to the emission, transport and reaction of air pollutants, outliers will be detected by extreme values of their attributes compared to the attribute values of their neighbours. The implemented method can be of interest as a data quality screening system when countries report their measurements to the European Environment Agency. Beyond this, it could also provide a simple solution to investigate the accuracy of station classification in AirBase.JRC.H.2-Air and Climat

    Similar works