5 research outputs found
Using Machine Learning for Model Physics: an Overview
In the overview, a generic mathematical object (mapping) is introduced, and
its relation to model physics parameterization is explained. Machine learning
(ML) tools that can be used to emulate and/or approximate mappings are
introduced. Applications of ML to emulate existing parameterizations, to
develop new parameterizations, to ensure physical constraints, and control the
accuracy of developed applications are described. Some ML approaches that allow
developers to go beyond the standard parameterization paradigm are discussed.Comment: 50 pages, 3 figures, 1 tabl
APPLICATION OF NEURAL NETWORKS TO EMULATION OF RADIATION PARAMETERIZATIONS IN GENERAL CIRCULATION MODELS
A novel approach based on using neural network (NN) techniques for approximation of physical components of complex environmental systems has been applied and further developed in this dissertation. A new type of a numerical model, a complex hybrid environmental model, based on a combination of deterministic and statistical learning model components, has been explored. Conceptual and practical aspects of developing hybrid models have been formalized as a methodology for applications to climate modeling and numerical weather prediction. The approach uses NN as a machine or statistical learning technique to develop highly accurate and fast emulations for model physics components/parameterizations. The NN emulations of the most time consuming model physics components, short and long wave radiation (LWR and SWR) parameterizations have been combined with the remaining deterministic components of a general circulation model (GCM) to constitute a hybrid GCM (HGCM). The parallel GCM and HGCM simulations produce very similar results but HGCM is significantly faster. The high accuracy, which is of a paramount importance for the approach, and a speed-up of model calculations when using NN emulations, open the opportunity for model improvement. It includes using extended NN ensembles and/or more frequent calculations of full model radiation resulting in an improvement of radiation-cloud interaction, a better consistency with model dynamics and other model physics components.
First, the approach was successfully applied to a moderate resolution (T42L26) uncoupled NCAR Community Atmospheric Model driven by climatological SST for a decadal climate simulation mode. Then it has been further developed and subsequently implemented into a coupled GCM, the NCEP Climate Forecast System with significantly higher resolution (T126L64) and time dependent CO2 and tested for decadal climate simulations, seasonal prediction, and short- to medium term forecasts.
The developed highly accurate NN emulations of radiation parameterizations are on average one to two orders of magnitude faster than the original radiation parameterizations. The NN approach was extended by introduction of NN ensembles and a compound parameterization with quality control of larger errors.
Applicability of other statistical learning techniques, such as approximate nearest neighbor approximation and random trees, to emulation of model physics has also been explore
Ensemble Learning in the Presence of Noise
Learning in the presence of noise is an important issue in machine learning. The design
and implementation of e ective strategies for automatic induction from noisy data is
particularly important in real-world problems, where noise from defective collecting
processes, data contamination or intrinsic
uctuations is ubiquitous. There are two
general strategies to address this problem. One is to design a robust learning method.
Another one is to identify noisy instances and eliminate or correct them.
In this thesis we propose to use ensembles to mitigate the negative impact of mislabelled
data in the learning process. In ensemble learning the predictions of individual learners
are combined to obtain a nal decision. E ective combinations take advantage of the
complementarity of these base learners. In this manner the errors incurred by a learner
can be compensated by the predictions of other learners in the combination.
A rst contribution of this work is the use of subsampling to build bootstrap ensembles,
such as bagging and random forest, that are resilient to class label noise. By using lower
sampling rates, the detrimental e ect of mislabelled examples on the nal ensemble
decisions can be tempered. The reason is that each labelled instance is present in a
smaller fraction of the training sets used to build individual learners. Ensembles can
also be used as a noise detection procedure to improve the quality of the data used for
training. In this strategy, one attempts to identify noisy instances and either correct (by
switching their class label) or discard them. A particular example is identi ed as noise
if a speci ed percentage (greater than 50%) of the learners disagree with the given label
for this example. Using an extensive empirical evaluation we demonstrate the use of
subsampling as an e ective tool to detect and handle noise in classi cation problems
Recommended from our members
Structure combination of forecasting models with application in the energy sector
This dissertation proposes and implements the inclusion of model structure in combining forecasts. Empirical investigations are conducted with an emphasis on neural networks and seasonal exponential smoothing models using synthetic data and real time series, from the electricity sector. It starts with a literature review on combining forecasts and ensembles of neural networks, and highlights their use in forecasting within the energy sector. Research gaps are identified and the questions to be addressed in this research are set, thus leading to
three empirical studies.
The first study provides a detailed sensitivity analysis of the goodness-of-fit and forecasting performance of feed-forward neural networks on time series with different characteristics. It expands existing literature by increasing the number and variety of time series and by using graphical and statistical diagnostics to objectively judge the influence of model specification on forecasting performance. Having identified conditions for achieving stable model performance, this study facilitated the identification of suitable models for different time series characteristics, which are then useful in developing combinations (ensembles) of feed forward neural networks.
The second study proposes structural combination methods based on clustering (CB) and genetic algorithms (GA) for forecasting time series. Clustering of neural networks using their parameter space is performed to identify a pool of forecasts to be combined. Three synthetic time series and two real time series (electricity demand and wind power production) were used to assess the performance of the two proposals against several benchmarks in univariate and multivariate forecasting problems. Structural combinations with GA were more competitive than those with CB for non-seasonal time series and the multivariate wind power forecasting application, whereas for the seasonal series, the CB tended to be more competitive.
The third study focused on forecasting univariate time series with seasonality, by structurally combining, in separate applications, multiplicative Holt-Winters and multiplicative Holt-Winters-Taylor models. Noise addition and block swapping were applied to the original time series in order to generate structurally diverse individual models. Applications were conducted using a seasonal daily peak electricity demand time series, an hourly double-seasonal electricity demand series and a half-hourly double-seasonal electricity demand series. Structural combinations worked better for the peak electricity demand and half-hourly demand time series when model variation was induced via noise addition. For the double-seasonal hourly electricity demand, block swapping, as a means for diversity in models, resulted in better forecasts.
Finally, in the last chapter of this dissertation, conclusions are drawn from this research. The contribution to the literature is assessed and a future research agenda is proposed