2,325 research outputs found

    Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review

    Get PDF
    The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features

    A comparison of statistical and machine learning methods for creating national daily maps of ambient PM2.5_{2.5} concentration

    Get PDF
    A typical problem in air pollution epidemiology is exposure assessment for individuals for which health data are available. Due to the sparsity of monitoring sites and the limited temporal frequency with which measurements of air pollutants concentrations are collected (for most pollutants, once every 3 or 6 days), epidemiologists have been moving away from characterizing ambient air pollution exposure solely using measurements. In the last few years, substantial research efforts have been placed in developing statistical methods or machine learning techniques to generate estimates of air pollution at finer spatial and temporal scales (daily, usually) with complete coverage. Some of these methods include: geostatistical techniques, such as kriging; spatial statistical models that use the information contained in air quality model outputs (statistical downscaling models); linear regression modeling approaches that leverage the information in GIS covariates (land use regression); or machine learning methods that mine the information contained in relevant variables (neural network and deep learning approaches). Although some of these exposure modeling approaches have been used in several air pollution epidemiological studies, it is not clear how much the predicted exposures generated by these methods differ, and which method generates more reliable estimates. In this paper, we aim to address this gap by evaluating a variety of exposure modeling approaches, comparing their predictive performance and computational difficulty. Using PM2.5_{2.5} in year 2011 over the continental U.S. as case study, we examine the methods' performances across seasons, rural vs urban settings, and levels of PM2.5_{2.5} concentrations (low, medium, high)

    Modelling atmospheric ozone concentration using machine learning algorithms

    Get PDF
    Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages

    Modelling and Forecasting Temporal PM<sub>2.5</sub> Concentration Using Ensemble Machine Learning Methods

    Get PDF
    Exposure of humans to high concentrations of PM2.5 has adverse effects on their health. Researchers estimate that exposure to particulate matter from fossil fuel emissions accounted for 18% of deaths in 2018&mdash;a challenge policymakers argue is being exacerbated by the increase in the number of extreme weather events and rapid urbanization as they tinker with strategies for reducing air pollutants. Drawing on a number of ensemble machine learning methods that have emerged as a result of advancements in data science, this study examines the effectiveness of using ensemble models for forecasting the concentrations of air pollutants, using PM2.5 as a representative case. A comprehensive evaluation of the ensemble methods was carried out by comparing their predictive performance with that of other standalone algorithms. The findings suggest that hybrid models provide useful tools for PM2.5 concentration forecasting. The developed models show that machine learning models are efficient in predicting air particulate concentrations, and can be used for air pollution forecasting. This study also provides insights into how climatic factors influence the concentrations of pollutants found in the air

    Nonindigenous Aquatic Species

    Get PDF
    Online resource center, maintained by U.S.G.S., provides information, data, links about exotic plants, invertebrates, vertebrates, diseases and parasites. Central repository contains accurate and spatially referenced biogeographic accounts of alien aquatic species. Search for species by state, drainage area, citation in texts; find fact sheets, maps showing occurrence in the U.S. Or, for each taxon, review list of exotic species, find scientific, common name, photo, status; link to facts and distribution map. Educational levels: General public, High school

    Ensemble Methods in Environmental Data Mining

    Get PDF
    Environmental data mining is the nontrivial process of identifying valid, novel, and potentially useful patterns in data from environmental sciences. This chapter proposes ensemble methods in environmental data mining that combines the outputs from multiple classification models to obtain better results than the outputs that could be obtained by an individual model. The study presented in this chapter focuses on several ensemble strategies in addition to the standard single classifiers such as decision tree, naive Bayes, support vector machine, and k-nearest neighbor (KNN), popularly used in literature. This is the first study that compares four ensemble strategies for environmental data mining: (i) bagging, (ii) bagging combined with random feature subset selection (the random forest algorithm), (iii) boosting (the AdaBoost algorithm), and (iv) voting of different algorithms. In the experimental studies, ensemble methods are tested on different real-world environmental datasets in various subjects such as air, ecology, rainfall, and soil

    A comparative study of calibration methods for low-cost ozone sensors in IoT platforms

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.This paper shows the result of the calibration process of an Internet of Things platform for the measurement of tropospheric ozone (O 3 ). This platform, formed by 60 nodes, deployed in Italy, Spain, and Austria, consisted of 140 metal–oxide O 3 sensors, 25 electro-chemical O 3 sensors, 25 electro-chemical NO 2 sensors, and 60 temperature and relative humidity sensors. As ozone is a seasonal pollutant, which appears in summer in Europe, the biggest challenge is to calibrate the sensors in a short period of time. In this paper, we compare four calibration methods in the presence of a large dataset for model training and we also study the impact of a limited training dataset on the long-range predictions. We show that the difficulty in calibrating these sensor technologies in a real deployment is mainly due to the bias produced by the different environmental conditions found in the prediction with respect to those found in the data training phase.Peer ReviewedPostprint (author's final draft

    Features Exploration from Datasets Vision in Air Quality Prediction Domain

    Get PDF
    Air pollution and its consequences are negatively impacting on the world population and the environment, which converts the monitoring and forecasting air quality techniques as essential tools to combat this problem. To predict air quality with maximum accuracy, along with the implemented models and the quantity of the data, it is crucial also to consider the dataset types. This study selected a set of research works in the field of air quality prediction and is concentrated on the exploration of the datasets utilised in them. The most significant findings of this research work are: (1) meteorological datasets were used in 94.6% of the papers leaving behind the rest of the datasets with a big difference, which is complemented with others, such as temporal data, spatial data, and so on; (2) the usage of various datasets combinations has been commenced since 2009; and (3) the utilisation of open data have been started since 2012, 32.3% of the studies used open data, and 63.4% of the studies did not provide the data

    Comparative analysis of multiple classification models to improve PM10 prediction performance

    Get PDF
    With the increasing requirement of high accuracy for particulate matter prediction, various attempts have been made to improve prediction accuracy by applying machine learning algorithms. However, the characteristics of particulate matter and the problem of the occurrence rate by concentration make it difficult to train prediction models, resulting in poor prediction. In order to solve this problem, in this paper, we proposed multiple classification models for predicting particulate matter concentrations required for prediction by dividing them into AQI-based classes. We designed multiple classification models using logistic regression, decision tree, SVM and ensemble among the various machine learning algorithms. The comparison results of the performance of the four classification models through error matrices confirmed the f-score of 0.82 or higher for all the models other than the logistic regression model

    Comparative Analysis of Machine Learning Techniques for Predicting Air Pollution

    Get PDF
    The modern and motorized way of life has cultured air pollution.&nbsp; Air pollution has become the biggest rival of robust living. This situation is becoming more lethal in developing countries and so in Pakistan.&nbsp; Hence, this inquiry was carried out to propose an architecture design that could make real-time prediction of air pollution with another purpose of scanning the frequently adopted algorithm in past investigations. In addition, it was also intended to narrate the toxic effects of air pollution on human health. So, this research was carried out on a large dataset of Seoul as an adequate dataset of Pakistan was not attainable. The dataset consisted of three years (2017-2019) including 647,512 instances and 11 attributes. The four distinctive algorithms termed Random Forest, Linear Regression, Decision Tree and XGBoosting were employed. It was inferred that XGB is more promising and feasible in predicting concentration level of NO2, O3, SO2, PM10, PM2.5 and CO with the lowest RMSE and MAE values of 0.0111, 0.0262, 0.0168, 49.64, 41.68 and 0.1856 and 0.0067, 0.0096, 0.0017, 12.28, 7.63 and 0.0982 respectively. Furthermore, it was found out as well that the Random Forest was preferred mostly in the previous studies related to air pollution prophecy while many probes supported that air pollution is very detrimental to human health especially long-lasting exposure causes lung cancer, respiratory and cardiovascular diseases
    • …