292 research outputs found

    Mind the large gap : novel algorithm using seasonal decomposition and elastic net regression to impute large intervals of missing data in air quality data

    Get PDF
    Air quality data sets are widely used in numerous analyses. Missing values are ubiquitous in air quality data sets as the data are collected through sensors. Recovery of missing data is a challenging task in the data preprocessing stage. This task becomes more challenging in time series data as time is an implicit variable that cannot be ignored. Even though existing methods to deal with missing data in time series perform well in situations where the percentage of missing values is relatively low and the gap size is small, their performances are reasonably lower when it comes to large gaps. This paper presents a novel algorithm based on seasonal decomposition and elastic net regression to impute large gaps of time series data when there exist correlated variables. This method outperforms several other existing univariate approaches namely Kalman smoothing on ARIMA models, Kalman smoothing on structural time series models, linear interpolation, and mean imputation in imputing large gaps. However, this is applicable only when there exists one or more correlated variables with the time series with large gaps

    Spatial-temporal prediction of air quality based on recurrent neural networks

    Get PDF
    To predict air quality (PM2.5 concentrations, et al), many parametric regression models have been developed, while deep learning algorithms are used less often. And few of them takes the air pollution emission or spatial information into consideration or predict them in hour scale. In this paper, we proposed a spatial-temporal GRU-based prediction framework incorporating ground pollution monitoring (GPM), factory emissions (FE), surface meteorology monitoring (SMM) variables to predict hourly PM2.5 concentrations. The dataset for empirical experiments was built based on air quality monitoring in Shenyang, China. Experimental results indicate that our method enables more accurate predictions than all baseline models and by applying the convolutional processing to the GPM and FE variables notable improvement can be achieved in prediction accuracy

    A Comparative Analysis for Air Quality Estimation from Traffic and Meteorological Data

    Get PDF
    Air pollution in urban regions remains a crucial subject of study, given its implications on health and environment, where much effort is often put into monitoring pollutants and producing accurate trend estimates over time, employing expensive tools and sensors. In this work, we study the problem of air quality estimation in the urban area of Milan (IT), proposing different machine learning approaches that combine meteorological and transit-related features to produce affordable estimates without introducing sensor measurements into the computation. We investigated different configurations employing machine and deep learning models, namely a linear regressor, an Artificial Neural Network using Bayesian regularization, a Random Forest regressor and a Long Short Term Memory network. Our experiments show that affordable estimation results over the pollutants can be achieved even with simpler linear models, therefore suggesting that reasonably accurate Air Quality Index (AQI) measurements can be obtained without the need for expensive equipment

    Spatiotemporal and temporal forecasting of ambient air pollution levels through data-intensive hybrid artificial neural network models

    Get PDF
    Outdoor air pollution (AP) is a serious public threat which has been linked to severe respiratory and cardiovascular illnesses, and premature deaths especially among those residing in highly urbanised cities. As such, there is a need to develop early-warning and risk management tools to alleviate its effects. The main objective of this research is to develop AP forecasting models based on Artificial Neural Networks (ANNs) according to an identified model-building protocol from existing related works. Plain, hybrid and ensemble ANN model architectures were developed to estimate the temporal and spatiotemporal variability of hourly NO2 levels in several locations in the Greater London area. Wavelet decomposition was integrated with Multilayer Perceptron (MLP) and Long Short-term Memory (LSTM) models to address the issue of high variability of AP data and improve the estimation of peak AP levels. Block-splitting and crossvalidation procedures have been adapted to validate the models based on Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Willmott’s index of agreement (IA). The results of the proposed models present better performance than those from the benchmark models. For instance, the proposed wavelet-based hybrid approach provided 39.15% and 28.58% reductions in RMSE and MAE indices, respectively, on the performance of the benchmark MLP model results for the temporal forecasting of NO2 levels. The same approach reduced the RMSE and MAE indices of the benchmark LSTM model results by 12.45% and 20.08%, respectively, for the spatiotemporal estimation of NO2 levels in one site at Central London. The proposed hybrid deep learning approach offers great potential to be operational in providing air pollution forecasts in areas without a reliable database. The model-building protocol adapted in this thesis can also be applied to studies using measurements from other sites.Outdoor air pollution (AP) is a serious public threat which has been linked to severe respiratory and cardiovascular illnesses, and premature deaths especially among those residing in highly urbanised cities. As such, there is a need to develop early-warning and risk management tools to alleviate its effects. The main objective of this research is to develop AP forecasting models based on Artificial Neural Networks (ANNs) according to an identified model-building protocol from existing related works. Plain, hybrid and ensemble ANN model architectures were developed to estimate the temporal and spatiotemporal variability of hourly NO2 levels in several locations in the Greater London area. Wavelet decomposition was integrated with Multilayer Perceptron (MLP) and Long Short-term Memory (LSTM) models to address the issue of high variability of AP data and improve the estimation of peak AP levels. Block-splitting and crossvalidation procedures have been adapted to validate the models based on Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Willmott’s index of agreement (IA). The results of the proposed models present better performance than those from the benchmark models. For instance, the proposed wavelet-based hybrid approach provided 39.15% and 28.58% reductions in RMSE and MAE indices, respectively, on the performance of the benchmark MLP model results for the temporal forecasting of NO2 levels. The same approach reduced the RMSE and MAE indices of the benchmark LSTM model results by 12.45% and 20.08%, respectively, for the spatiotemporal estimation of NO2 levels in one site at Central London. The proposed hybrid deep learning approach offers great potential to be operational in providing air pollution forecasts in areas without a reliable database. The model-building protocol adapted in this thesis can also be applied to studies using measurements from other sites

    Implementation of Feature Engineering in Prediction of AQI in India using Machine Learning

    Get PDF
    Prediction of Air Quality Index (AQI) is the necessity of today’s era but for the prediction, analysis of different preprocessing techniques that can be applied, needs to be considered. In this study, first of all we explored various feature engineering techniques such as Data Imputation, Scaling, Extraction, Selection, and Data Split that can be used before applying machine learning algorithm for better results. Second, we used MLR and SVR (Linear, Gaussian) to build the prediction models. Finally, we used root mean square error (RMSE), R2, Mean Squared Error (MSE) and Mean Absolute Error (MAE) to evaluate the performance of the regression models in collaboration with the feature engineering techniques. The results shows that the performance of Linear SVR is better when coupled with imputation and robust scaler (R2=0.7557834846394744) as compared to the others, the performance of Gaussian SVR is better when coupled with the imputation only as compared to the others. In case of MLR, results (R2=0.7769187383819041) are almost same in all the 4 cases and performance degraded when PCA was applied

    Filling the G_ap_s: Multivariate Time Series Imputation by Graph Neural Networks

    Get PDF
    Dealing with missing values and incomplete time series is a labor-intensive, tedious, inevitable task when handling data coming from real-world applications. Effective spatio-temporal representations would allow imputation methods to reconstruct missing temporal data by exploiting information coming from sensors at different locations. However, standard methods fall short in capturing the nonlinear time and space dependencies existing within networks of interconnected sensors and do not take full advantage of the available - and often strong - relational information. Notably, most state-of-the-art imputation methods based on deep learning do not explicitly model relational aspects and, in any case, do not exploit processing frameworks able to adequately represent structured spatio-temporal data. Conversely, graph neural networks have recently surged in popularity as both expressive and scalable tools for processing sequential data with relational inductive biases. In this work, we present the first assessment of graph neural networks in the context of multivariate time series imputation. In particular, we introduce a novel graph neural network architecture, named GRIN, which aims at reconstructing missing data in the different channels of a multivariate time series by learning spatio-temporal representations through message passing. Empirical results show that our model outperforms state-of-the-art methods in the imputation task on relevant real-world benchmarks with mean absolute error improvements often higher than 20%.Comment: Accepted at ICLR 202

    Ensemble model-based method for time series sensors’ data validation and imputation applied to a real waste water treatment plant

    Get PDF
    Intelligent Decision Support Systems (IDSSs) integrate different Artificial Intelligence (AI) techniques with the aim of taking or supporting human-like decisions. To this end, these techniques are based on the available data from the target process. This implies that invalid or missing data could trigger incorrect decisions and therefore, undesirable situations in the supervised process. This is even more important in environmental systems, which incorrect malfunction could jeopardise related ecosystems. In data-driven applications such as IDSS, data quality is a basal problem that should be addressed for the sake of the overall systems’ performance. In this paper, a data validation and imputation methodology for time-series is presented. This methodology is integrated in an IDSS software tool which generates suitable control set-points to control the process. The data validation and imputation approach presented here is focused on the imputation step, and it is based on an ensemble of different prediction models obtained for the sensors involved in the process. A Case-Based Reasoning (CBR) approach is used for data imputation, i.e., similar past situations to the current one can propose new values for the missing ones. The CBR model is complemented with other prediction models such as Auto Regressive (AR) models or Artificial Neural Network (ANN) models. Then, the different obtained predictions are ensembled to obtain a better prediction performance than the obtained by each individual prediction model separately. Furthermore, the use of a meta-prediction model, trained using the predictions of all individual models as inputs, is proposed and compared with other ensemble methods to validate its performance. Finally, this approach is illustrated in a real Waste Water Treatment Plant (WWTP) case study using one of the most relevant measures for the correct operation of the WWTPs IDSS, i.e., the ammonia sensor, and considering real faults, showing promising results with improved performance when using the ensemble approach presented here compared against the prediction obtained by each individual model separately.The authors acknowledge the partial support of this work by the Industrial Doctorate Programme (2017DI-006) and the Research Consolidated Groups/Centres Grant (2017 SGR 574) from the Catalan Agency of University and Research Grants Management (AGAUR), from Catalan Government.Peer ReviewedPostprint (published version

    Graph Neural Network for spatiotemporal data: methods and applications

    Full text link
    In the era of big data, there has been a surge in the availability of data containing rich spatial and temporal information, offering valuable insights into dynamic systems and processes for applications such as weather forecasting, natural disaster management, intelligent transport systems, and precision agriculture. Graph neural networks (GNNs) have emerged as a powerful tool for modeling and understanding data with dependencies to each other such as spatial and temporal dependencies. There is a large amount of existing work that focuses on addressing the complex spatial and temporal dependencies in spatiotemporal data using GNNs. However, the strong interdisciplinary nature of spatiotemporal data has created numerous GNNs variants specifically designed for distinct application domains. Although the techniques are generally applicable across various domains, cross-referencing these methods remains essential yet challenging due to the absence of a comprehensive literature review on GNNs for spatiotemporal data. This article aims to provide a systematic and comprehensive overview of the technologies and applications of GNNs in the spatiotemporal domain. First, the ways of constructing graphs from spatiotemporal data are summarized to help domain experts understand how to generate graphs from various types of spatiotemporal data. Then, a systematic categorization and summary of existing spatiotemporal GNNs are presented to enable domain experts to identify suitable techniques and to support model developers in advancing their research. Moreover, a comprehensive overview of significant applications in the spatiotemporal domain is offered to introduce a broader range of applications to model developers and domain experts, assisting them in exploring potential research topics and enhancing the impact of their work. Finally, open challenges and future directions are discussed

    Urban PM2.5 concentration prediction via attention-based CNN–LSTM.

    Get PDF
    Urban particulate matter forecasting is regarded as an essential issue for early warning and control management of air pollution, especially fine particulate matter (PM2.5). However, existing methods for PM2.5 concentration prediction neglect the effects of featured states at different times in the past on future PM2.5 concentration, and most fail to effectively simulate the temporal and spatial dependencies of PM2.5 concentration at the same time. With this consideration, we propose a deep learning-based method, AC-LSTM, which comprises a one-dimensional convolutional neural network (CNN), long short-term memory (LSTM) network, and attention-based network, for urban PM2.5 concentration prediction. Instead of only using air pollutant concentrations, we also add meteorological data and the PM2.5 concentrations of adjacent air quality monitoring stations as the input to our AC-LSTM. Hence, the spatiotemporal correlation and interdependence of multivariate air quality-related time-series data are learned by the CNN-LSTM network in AC-LSTM. The attention mechanism is applied to capture the importance degrees of the effects of featured states at different times in the past on future PM2.5 concentration. The attention-based layer can automatically weigh the past feature states to improve prediction accuracy. In addition, we predict the PM2.5 concentrations over the next 24 h by using air quality data in Taiyuan city, China, and compare it with six baseline methods. To compare the overall performance of each method, the mean absolute error (MAE), root-mean-square error (RMSE), and coecient of determination (R2) are applied to the experiments in this paper. The experimental results indicate that our method is capable of dealing with PM2.5 concentration prediction with the highest performance

    A review of artificial neural network models for ambient air pollution prediction

    Get PDF
    Research activity in the field of air pollution forecasting using artificial neural networks (ANNs) has increased dramatically in recent years. However, the development of ANN models entails levels of uncertainty given the black-box nature of ANNs. In this paper, a protocol by Maier et al. (2010) for ANN model development is presented and applied to assess journal papers dealing with air pollution forecasting using ANN models. The majority of the reviewed works are aimed at the long-term forecasting of outdoor PM10, PM2.5, and oxides of nitrogen, and ozone. The vast majority of the identified works utilised meteorological and source emissions predictors almost exclusively. Furthermore, ad-hoc approaches are found to be predominantly used for determining optimal model predictors, appropriate data subsets and the optimal model structure. Multilayer perceptron and ensemble-type models are predominantly implemented. Overall, the findings highlight the need for developing systematic protocols for developing powerful ANN models
    corecore