97,057 research outputs found
Modeling local predictive ability using power-transformed Gaussian processes
A Gaussian process is proposed as a model for the posterior distribution of
the local predictive ability of a model or expert, conditional on a vector of
covariates, from historical predictions in the form of log predictive scores.
Assuming Gaussian expert predictions and a Gaussian data generating process, a
linear transformation of the predictive score follows a noncentral chi-squared
distribution with one degree of freedom. Motivated by this we develop a
non-central chi-squared Gaussian process regression to flexibly model local
predictive ability, with the posterior distribution of the latent GP function
and kernel hyperparameters sampled by Hamiltonian Monte Carlo. We show that a
cube-root transformation of the log scores is approximately Gaussian with
homoscedastic variance, which makes it possible to estimate the model much
faster by marginalizing the latent GP function analytically. Linear pools based
on learned local predictive ability are applied to predict daily bike usage in
Washington DC.Comment: 27 pages, 15 figures. This paper was included in the first author's
PhD thesis: Oelrich, O. (2022) 'Learning Local Predictive Accuracy for Expert
Evaluation and Forecast Combination' which can be found at
https://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-21091
Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space
We introduce an unsupervised clustering algorithm to improve training
efficiency and accuracy in predicting energies using molecular-orbital-based
machine learning (MOB-ML). This work determines clusters via the Gaussian
mixture model (GMM) in an entirely automatic manner and simplifies an earlier
supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by
eliminating both the necessity for user-specified parameters and the training
of an additional classifier. Unsupervised clustering results from GMM have the
advantage of accurately reproducing chemically intuitive groupings of frontier
molecular orbitals and having improved performance with an increasing number of
training examples. The resulting clusters from supervised or unsupervised
clustering is further combined with scalable Gaussian process regression (GPR)
or linear regression (LR) to learn molecular energies accurately by generating
a local regression model in each cluster. Among all four combinations of
regressors and clustering methods, GMM combined with scalable exact Gaussian
process regression (GMM/GPR) is the most efficient training protocol for
MOB-ML. The numerical tests of molecular energy learning on thermalized
datasets of drug-like molecules demonstrate the improved accuracy,
transferability, and learning efficiency of GMM/GPR over not only other
training protocols for MOB-ML, i.e., supervised regression-clustering combined
with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best
molecular energy predictions compared with the ones from literature on the same
benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in
wall-clock training time compared with scalable exact GPR with a training size
of 6500 QM7b-T molecules.Comment: 28 pages, 7 figure
Network Risk Modelling via Warped Gaussian Processes
Many problems in Operational Research domains involve the study of risk management on a constrained topology of some form of physical network such as logistic networks or intangible networks such as banking financial networks and distributed ledger networks in modern DeFi movements. In this paper we explore an emerging class of machine learning models that can flexibly model and assess risk on a graphical or network based topology. We will focus primarily on the use of undirected graphical structures and we will develop a stochastic model representa- tion that is able to model dynamic multivariate processes on a graphical topology conditional on exogenous observable covariates. This will be achieved by extending classical Gaussian processes regressions models in machine learning to non- stationary, warped Gaussian process regression models. Since we are focused on risk management aspects of processes restricted to graphs, we note that extreme events often display heterogeneity (i.e., non-stationarity), varying continuously with a number of covariates. In the framework we develop we will be able to study and explain such extreme joint variations through the regressions structures developed and the covariance operators characterizing the process. Having proposed a class of such warped Gaussian process network regression models, we will then study Bregman super-quantiles on such networks that will allow us to develop a class of network based coherent risk measures which have the added advantage of being sub-additive, allowing one to aggregate of local graphical cliques to understand local risk behaviours
Dataâdriven modelling of turbine wake interactions and flow resistance in large wind farms
Turbine wake and local blockage effects are known to alter wind farm power production in two different ways: (1) by changing the wind speed locally in front of each turbine and (2) by changing the overall flow resistance in the farm and thus the so-called farm blockage effect. To better predict these effects with low computational costs, we develop data-driven emulators of the âlocalâ or âinternalâ turbine thrust coefficient
as a function of turbine layout. We train the model using a multi-fidelity Gaussian process (GP) regression with a combination of low (engineering wake model) and high-fidelity (large eddy simulations) simulations of farms with different layouts and wind directions. A large set of low-fidelity data speeds up the learning process and the high-fidelity data ensures a high accuracy. The trained multi-fidelity GP model is shown to give more accurate predictions of
compared to a standard (single-fidelity) GP regression applied only to a limited set of high-fidelity data. We also use the multi-fidelity GP model of
with the two-scale momentum theory (Nishino & Dunstan 2020, J. Fluid Mech. 894, A2) to demonstrate that the model can be used to give fast and accurate predictions of large wind farm performance under various mesoscale atmospheric conditions. This new approach could be beneficial for improving annual energy production (AEP) calculations and farm optimization in the future
Dataâdriven modelling of turbine wake interactions and flow resistance in large wind farms
Turbine wake and local blockage effects are known to alter wind farm power production in two different ways: (1) by changing the wind speed locally in front of each turbine and (2) by changing the overall flow resistance in the farm and thus the so-called farm blockage effect. To better predict these effects with low computational costs, we develop data-driven emulators of the âlocalâ or âinternalâ turbine thrust coefficient C_{*}^{T} as a function of turbine layout. We train the model using a multi-fidelity Gaussian process (GP) regression with a combination of low (engineering wake model) and high-fidelity (large eddy simulations) simulations of farms with different layouts and wind directions. A large set of low-fidelity data speeds up the learning process and the high-fidelity data ensures a high accuracy. The trained multi-fidelity GP model is shown to give more accurate predictions of C_{*}^{T} compared to a standard (single-fidelity) GP regression applied only to a limited set of high-fidelity data. We also use the multi-fidelity GP model of C_{*}^{T} with the two-scale momentum theory (Nishino & Dunstan 2020, J. Fluid Mech. 894, A2) to demonstrate that the model can be used to give fast and accurate predictions of large wind farm performance under various mesoscale atmospheric conditions. This new approach could be beneficial for improving annual energy production (AEP) calculations and farm optimization in the future
- âŠ