38,470 research outputs found

    Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review

    Get PDF
    The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features

    Comparison of multiple machine learning algorithms for urban air quality forecasting

    Get PDF
    Environmental air pollution has become one of the major threats to human lives nowadays in developed and developing countries. Due to its importance, there exist various air pollution forecasting models, however, machine learning models proved one of the most efficient methods for prediction. In this paper, we assessed the ability of machine learning techniques to forecast NO2, SO2, and PM10 in Amman, Jordan. We compared multiple machine learning methods like artificial neural networks, support vector regression, decision tree regression, and extreme gradient boosting. We also investigated the effect of the pollution station and the meteorological station distance on the prediction result as well as explored the most relevant seasonal variables and the most important minimal set of features required for prediction to improve the prediction time. The experiments showed promising results for predicting air pollution in Amman with artificial neural network outperforming the other algorithms and scoring RMSE of 0.949 ppb, 0.451 ppb, and 5.570 µg/m3 for NO2, SO2, and PM10 respectively. Our results indicated that when the meteorological variables were obtained from the same pollution station the results were better. We were also able to reduce the time by reducing the set of variables required for prediction from 11 down to 3 and achieved major time improvement by about 80% for NO2, 92% for SO2, and 90% for PM10. The most important variables required for predicting NO2 were the previous day values of NO2, humidity and wind direction. While for SO2 they were the previous day values of SO2, temperature, and wind direction values of the previous day. Finally, for PM10 they were the previous day values of PM10, humidity, and day of the year

    The Temporal and Frequent Pattern Mining Analysis and Machine Learning Forecasting on Mobile Sourced Urban Air Pollutants

    Get PDF
    Ground-level ozone and atmospheric fine particles (PM2.5) have been recognized as critical air pollutants that act as important contributors to the toxicity of anthropogenic air pollution in urban areas. To limit the adverse impacts on public health and ecosystems of ground-level ozone and PM2.5, it is necessary and imperative to identify a practical and effective way to predict the upcoming pollution concentration levels accurately. Under this need, various research was conducted aiming to perform the forecasting of ground-level ozone and PM2.5 that mainly utilized the time-series and neural network analysis. In the meantime, machine learning is also adopted in analysis and forecasting in existing research, which is, however, associated with some limitations that are not easily overcome. (1) The majority of existing forecasting models are highly dependent on time-series inputs without considering the influencing factors of the air pollutants. While a relatively accurate prediction may be provided, the influencing factors of the air pollution level caused by real-world complexity are neglected. (2) The existing forecasting models are mainly focused on the short-term estimation, while some of them need to use the previous prediction as a part of the input, which increased the system complexity and decreased the computational efficiency and accuracy. (3) The accurate annual hourly air pollution level forecasting ability is seldomly achieved. The objective of this research is to propose a systematical methodology to forecast the long-term hourly future air pollution concentration levels through historical data considering the concentration influencing factors. To achieve this research goal, a series of methodologies to analyze the historical air pollution concentration by temporal characteristics and frequent pattern data mining algorithms are introduced. The association rules of air pollution concentration levels and the influencing factors are revealed. A systematical air pollution level forecasting approach based on supervised machine learning algorithms with the ability to predict the annual hourly value is proposed and evaluated. To quantify and validate the results, a case study was conducted in the Houston region with the collection and analysis of ten years of historical environmental, meteorological, and transportation-related data. From the results of this research, (1) the complex correlations between the influencing factors and air pollution concentration levels are quantified and presented. (2) The association rules between each dependant and independent parameters are calculated. (3) The supervised machine learning algorithm pool is created and evaluated. And (4), an accurate long-term hourly air pollution level machine learning forecasting procedure is proposed. The innovative methodology of this research is advanced in computation complexity with high accuracy when compared with the existing models, which could be easily applied to similar regions for various types of air pollution concentration level forecasting

    Assessing uncertainty and heterogeneity in machine learning-based spatiotemporal ozone prediction in Beijing-Tianjin- Hebei region in China

    Get PDF
    Accurate prediction of spatiotemporal ozone concentration is of great significance to effectively establish advanced early warning systems and regulate air pollution control. However, the comprehensive assessment of uncertainty and heterogeneity in spatiotemporal ozone prediction remains unknown. Here, we systematically analyze the hourly and daily spatiotemporal predictive performances using convolutional long short term memory (ConvLSTM) and deep convolutional generative adversarial network (DCGAN) models over the Beijing-Tianjin-Hebei region in China from 2013 to 2018. In extensive scenarios, our results show that the machine learning-based (ML-based) models achieve better spatiotemporal ozone concentration prediction performance with multiple meteorological conditions. A further comparison to the air pollution model-Nested Air Quality Prediction Modelling System (NAQPMS) and monitoring observations, the ConvLSTM model demonstrates the practical feasibility of identifying high ozone concentration distribution and capturing spatiotemporal ozone variation patterns at a high spatial resolution (here 15 km × 15 km)

    Traffic-Related Air Pollutant (TRAP) Prediction using Big Data and Machine Learning

    Get PDF
    The negative impact of the Increasing air pollution on the global economy, quality of life of humans and health of animals and plants has been enormous. Several works of literature, reports and news around the world have highlighted the risk posed by the ever-in creasing air pollution and the threat to the lives of vulnerable groups such as children, the elderly, and people with respiratory and cardiovascular problems. The closest to home among all the air pollutants are the Traffic-Related Air Pollutants (TRAP), and they contribute the most to the risk posed to global health. This emphasises the urgency of the need for a highly accurate air pollution prediction model. Researchers have been able to achieve significant performance gain in predicting many of the pollutants except for the TRAP such as CO and NO which reported the worse prediction performance in many studies. CO and NO have been among the major pollutants of concern globally as they are linked to critical health hazards. Based on the established urgency of improving the accuracy of pollution prediction models, we collect recent data for six months and at high granularity in terms of time and location. The data is pre-processed and used to develop a Machine Learning (ML)based air pollution prediction model with high granularity and accuracy while focusing on traffic-related air pollutants CO and NO. Using the benchmarks r2and RMSE score, our ML models outperformed that of the studies reported in the literature for the prediction of TRAPs. This in part is due to the high data granularity we considered in terms of time and location

    Developing a GMDH-type neural network model for spatial prediction of NOx : A case study of Çerkezköy, Tekirdağ

    Get PDF
    Air pollution-induced issues involve public health, environmental, agricultural and socio-economic aspects. Therefore, decision-makers need low-cost, efficient tools with high spatiotemporal representation for monitoring air pollutants around urban areas and sensitive regions. Air pollution forecasting models with different time steps and forecast lengths are used as an alternative and support to traditional air quality monitoring stations (AQMS). In recent decades, given their eligibility to reconcile the relationship between parameters of complex systems, artificial neural networks have acquired the utmost importance in the field of air pollution forecasting. In this study, different machine learning regression methods are used to establish a mathematical relationship between air pollutants and meteorological factors from four AQMS (A-D) located between Çerkezköy and Süleymanpaşa, Tekirdağ. The model input variables included air pollutants and meteorological parameters. All developed models were used with the intent to provide instantaneous prediction of the air pollutant parameter NOx within the AQMS and across different stations. In the GMDH (group method of data handling)-type neural network method (namely the self-organizing deep learning approach), a five hidden layer structure consisting of a maximum of five neurons was preferred and, choice of layers and neurons were made in a way to minimize the error. In all models developed, the data were divided into a training (%80) and a testing set (%20). Based on R2, RMSE, and MAE values of all developed models, GMDH provided superior results regarding the NOx prediction within AQMS (reaching 0.94, 10.95, and 6.65, respectively for station A) and between different AQMS. The GMDH model yielded NOx prediction of station B by using station A input variables (without using NOx data as model input) with R2, RMSE and MAE values 0.80, 10.88, 7.31 respectively. The GMDH model is found suitable for being employed to fill in the gaps of air pollution records within and across-AQMS

    Development of a regional feature selection-based machine learning system (RFSML v1.0) for air pollution forecasting over China

    Get PDF
    With the explosive growth of atmospheric data, machine learning models have achieved great success in air pollution forecasting because of their higher computational efficiency than the traditional chemical transport models. However, in previous studies, new prediction algorithms have only been tested at stations or in a small region; a large-scale air quality forecasting model remains lacking to date. Huge dimensionality also means that redundant input data may lead to increased complexity and therefore the over-fitting of machine learning models. Feature selection is a key topic in machine learning development, but it has not yet been explored in atmosphere-related applications. In this work, a regional feature selection-based machine learning (RFSML) system was developed, which is capable of predicting air quality in the short term with high accuracy at the national scale. Ensemble-Shapley additive global importance analysis is combined with the RFSML system to extract significant regional features and eliminate redundant variables at an affordable computational expense. The significance of the regional features is also explained physically. Compared with a standard machine learning system fed with relative features, the RFSML system driven by the selected key features results in superior interpretability, less training time, and more accurate predictions. This study also provides insights into the difference in interpretability among machine learning models (i.e., random forest, gradient boosting, and multi-layer perceptron models).</p

    Regression Models to Predict Air Pollution from Affordable Data Collections

    Get PDF
    Air quality monitoring is key in assuring public health. However, the necessary equipment to accurately measure the criteria pollutants is expensive. Since the countries with more serious problems of air pollution are the less wealthy, this study proposes an affordable method based on machine learning to estimate the concentration of PM2.5. The capital city of Ecuador is used as case study. Several regression models are built from features of different levels of affordability. The first result shows that cheap data collection based on web traffic monitoring enables us to create a model that fairly correlates traffic density with air pollution. Building multiple models according to the hourly occurrence of the pollution peaks seems to increase the accuracy of the estimation, especially in the morning hours. The second result shows that adding meteorological factors allows for a significant improvement of the prediction of PM2.5 concentrations. Nevertheless, the last finding demonstrates that the best predictive model should be based on a hybrid source of data that includes trace gases. Since the sensors to monitor such gases are costly, the last part of the chapter gives some recommendations to get an accurate prediction from models that consider no more than two trace gases

    Modelling atmospheric ozone concentration using machine learning algorithms

    Get PDF
    Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages
    • …
    corecore