2,710 research outputs found

    Use of neural networks for tropospheric ozone time series approximation and forecasting ? a review

    No full text
    International audienceThe use of artificial neural networks in atmospheric science expands constantly. During the last years, many papers were published dealing with air pollution modeling. A number of papers deals with the time series approximation and forecasting of tropospheric ozone concentration. Neural networks have been found to outperform other statistical techniques like multiple regression etc. This paper reviews and discusses some practical aspects of the proposed neural network models applied to ozone concentration approximation and forecasting

    Data Assimilation by Artificial Neural Networks for an Atmospheric General Circulation Model: Conventional Observation

    Full text link
    This paper presents an approach for employing artificial neural networks (NN) to emulate an ensemble Kalman filter (EnKF) as a method of data assimilation. The assimilation methods are tested in the Simplified Parameterizations PrimitivE-Equation Dynamics (SPEEDY) model, an atmospheric general circulation model (AGCM), using synthetic observational data simulating localization of balloon soundings. For the data assimilation scheme, the supervised NN, the multilayer perceptrons (MLP-NN), is applied. The MLP-NN are able to emulate the analysis from the local ensemble transform Kalman filter (LETKF). After the training process, the method using the MLP-NN is seen as a function of data assimilation. The NN were trained with data from first three months of 1982, 1983, and 1984. A hind-casting experiment for the 1985 data assimilation cycle using MLP-NN were performed with synthetic observations for January 1985. The numerical results demonstrate the effectiveness of the NN technique for atmospheric data assimilation. The results of the NN analyses are very close to the results from the LETKF analyses, the differences of the monthly average of absolute temperature analyses is of order 0.02. The simulations show that the major advantage of using the MLP-NN is better computational performance, since the analyses have similar quality. The CPU-time cycle assimilation with MLP-NN is 90 times faster than cycle assimilation with LETKF for the numerical experiment.Comment: 17 pages, 16 figures, monthly weather revie

    Modelling atmospheric ozone concentration using machine learning algorithms

    Get PDF
    Air quality monitoring is one of several important tasks carried out in the area of environmental science and engineering. Accordingly, the development of air quality predictive models can be very useful as such models can provide early warnings of pollution levels increasing to unsatisfactory levels. The literature review conducted within the research context of this thesis revealed that only a limited number of widely used machine learning algorithms have been employed for the modelling of the concentrations of atmospheric gases such as ozone, nitrogen oxides etc. Despite this observation the research and technology area of machine learning has recently advanced significantly with the introduction of ensemble learning techniques, convolutional and deep neural networks etc. Given these observations the research presented in this thesis aims to investigate the effective use of ensemble learning algorithms with optimised algorithmic settings and the appropriate choice of base layer algorithms to create effective and efficient models for the prediction and forecasting of specifically, ground level ozone (O3). Three main research contributions have been made by this thesis in the application area of modelling O3 concentrations. As the first contribution, the performance of several ensemble learning (Homogeneous and Heterogonous) algorithms were investigated and compared with all popular and widely used single base learning algorithms. The results have showed impressive prediction performance improvement obtainable by using meta learning (Bagging, Stacking, and Voting) algorithms. The performances of the three investigated meta learning algorithms were similar in nature giving an average 0.91 correlation coefficient, in prediction accuracy. Thus as a second contribution, the effective use of feature selection and parameter based optimisation was carried out in conjunction with the application of Multilayer Perceptron, Support Vector Machines, Random Forest and Bagging based learning techniques providing significant improvements in prediction accuracy. The third contribution of research presented in this thesis includes the univariate and multivariate forecasting of ozone concentrations based of optimised Ensemble Learning algorithms. The results reported supersedes the accuracy levels reported in forecasting Ozone concentration variations based on widely used, single base learning algorithms. In summary the research conducted within this thesis bridges an existing research gap in big data analytics related to environment pollution modelling, prediction and forecasting where present research is largely limited to using standard learning algorithms such as Artificial Neural Networks and Support Vector Machines often available within popular commercial software packages

    A Novel Approach For Identifying Cloud Clusters Developing Into Tropical Cyclones

    Get PDF
    Providing advance notice of rare events, such as a cloud cluster (CC) developing into a tropical cyclone (TC), is of great importance. Having advance warning of such rare events possibly can help avoid or reduce the risk of damages and allow emergency responders and the affected community enough time to respond appropriately. Considering this, forecasters need better data mining and data driven techniques to identify developing CCs. Prior studies have attempted to predict the formation of TCs using numerical weather prediction models as well as satellite and radar data. However, refined observational data and forecasting techniques are not always available or accurate in areas such as the North Atlantic Ocean where data are sparse. Consequently, this research provides the predictive features that contribute to a CC developing into a TC using only global gridded satellite data that are readily available. This was accomplished by identifying and tracking CCs objectively where no expert knowledge is required to investigate the predictive features of developing CCs. We have applied the proposed oversampling technique named the Selective Clustering based Oversampling Technique (SCOT) to reduce the bias of the non-developing CCs when using standard classifiers. Our approach identifies twelve predictive features for developing CCs and demonstrates predictive skill for 0 - 48 hours prior to development. The results confirm that the proposed technique can satisfactorily identify developing CCs for each of the nine forecasts using standard classifiers such as Classification and Regression Trees (CART), neural networks, and support vector machines (SVM) and ten-fold cross validation. These results are based on the geometric mean values and are further verified using seven case studies such as Hurricane Katrina (2005). These results demonstrate that our proposed approach could potentially improve weather prediction and provide advance notice of a developing CC by using solely gridded satellite data

    Integrated data-driven techniques for environmental pollution monitoring

    Get PDF
    The adverse health e_x000B_ffects of tropospheric ozone around urban zones indicate a substantial risk for many segments of the population. This necessitates the short term forecast in order to take evasive action on days conducive to ozone formation. Therefore it is important to study the ozone formation mechanisms and predict the ozone levels in a geographic region. Multivariate statistical techniques provide a very e_x000B_ffective framework for the classifi_x000C_cation and monitoring of systems with multiple variables. Cluster analysis, sequence analysis and hidden Markov models (HMMs) are statistical methods which have been used in a wide range of studies to model the data structure. In this dissertation, we propose to formulate, implement and apply a data-driven computational framework for air quality monitoring and forecasting with application to ozone formation. The proposed framework integrates, in a unique way, advanced statistical data processing and analysis tools to investigate ozone formation mechanisms and predict the ozone levels in a geographic region. This dissertation focuses on cluster analysis for identi_x000C_fication and classi_x000C_fication of underlying mechanisms of a system and HMMs for predicting the occurrence of an extreme event in a system. The usefulness of the proposed methodology in air quality monitoring is demonstrated by applying it to study the ozone problem in Houston, Texas and Baton Rouge, Louisiana regions. Hierarchical clustering is used to visualize air flow patterns at two time scales relevant for ozone buildup. First, clustering is performed at the hourly time scale to identify surface flow patterns. Then, sequencing is performed at the daily time scale to identify groups of days sharing similar diurnal cycles for the surface flow. Selection of appropriate numbers of air flow patterns allowed inference of regional transport and dispersion patterns for understanding population exposure to ozone. This dissertation proposes to build HMMs for ozone prediction using air quality and meteorological measurements obtained from a network of surface monitors. The case study of the Houston, Texas region for the 2004 and 2005 ozone seasons showed that the results indicate the capability of HMMs as a simpler forecasting tool

    Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies

    Get PDF
    An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.Peer reviewe

    Mutual Information Input Selector and Probabilistic Machine Learning Utilisation for Air Pollution Proxies

    Get PDF
    An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.An air pollutant proxy is a mathematical model that estimates an unobserved air pollutant using other measured variables. The proxy is advantageous to fill missing data in a research campaign or to substitute a real measurement for minimising the cost as well as the operators involved (i.e., virtual sensor). In this paper, we present a generic concept of pollutant proxy development based on an optimised data-driven approach. We propose a mutual information concept to determine the interdependence of different variables and thus select the most correlated inputs. The most relevant variables are selected to be the best proxy inputs, where several metrics and data loss are also involved for guidance. The input selection method determines the used data for training pollutant proxies based on a probabilistic machine learning method. In particular, we use a Bayesian neural network that naturally prevents overfitting and provides confidence intervals around its output prediction. In this way, the prediction uncertainty could be assessed and evaluated. In order to demonstrate the effectiveness of our approach, we test it on an extensive air pollution database to estimate ozone concentration.Peer reviewe

    Identification of significant factors for air pollution levels using a neural network based knowledge discovery system

    Get PDF
    Artificial neural network (ANN) is a commonly used approach to estimate or forecast air pollution levels, which are usually assessed by the concentrations of air contaminants such as nitrogen dioxide, sulfur dioxide, carbon monoxide, ozone, and suspended particulate matters (PMs) in the atmosphere of the concerned areas. Even through ANN can accurately estimate air pollution levels they are numerical enigmas and unable to provide explicit knowledge of air pollution levels by air pollution factors (e.g. traffic and meteorological factors). This paper proposed a neural network based knowledge discovery system aimed at overcoming this limitation in ANN. The system consists of two units: a) an ANN unit, which is used to estimate the air pollution levels based on relevant air pollution factors; b) a knowledge discovery unit, which is used to extract explicit knowledge from the ANN unit. To demonstrate the practicability of this neural network based knowledge discovery system, numerical data on mass concentrations of PM2.5 and PM1.0, meteorological and traffic data measured near a busy traffic road in Hangzhou city were applied to investigate the air pollution levels and the potential air pollution factors that may impact on the concentrations of these PMs. Results suggest that the proposed neural network based knowledge discovery system can accurately estimate air pollution levels and identify significant factors that have impact on air pollution levels
    corecore