14 research outputs found
Healing Products of Gaussian Processes
Gaussian processes (GPs) are nonparametric Bayesian models that have been
applied to regression and classification problems. One of the approaches to
alleviate their cubic training cost is the use of local GP experts trained on
subsets of the data. In particular, product-of-expert models combine the
predictive distributions of local experts through a tractable product
operation. While these expert models allow for massively distributed
computation, their predictions typically suffer from erratic behaviour of the
mean or uncalibrated uncertainty quantification. By calibrating predictions via
a tempered softmax weighting, we provide a solution to these problems for
multiple product-of-expert models, including the generalised product of experts
and the robust Bayesian committee machine. Furthermore, we leverage the optimal
transport literature and propose a new product-of-expert model that combines
predictions of local experts by computing their Wasserstein barycenter, which
can be applied to both regression and classification.Comment: ICML 202
Imputation of Missing Streamflow Data at Multiple Gauging Stations in Benin Republic
Streamflow observation data is vital for flood monitoring, agricultural, and
settlement planning. However, such streamflow data are commonly plagued with
missing observations due to various causes such as harsh environmental
conditions and constrained operational resources. This problem is often more
pervasive in under-resourced areas such as Sub-Saharan Africa. In this work, we
reconstruct streamflow time series data through bias correction of the GEOGloWS
ECMWF streamflow service (GESS) forecasts at ten river gauging stations in
Benin Republic. We perform bias correction by fitting Quantile Mapping,
Gaussian Process, and Elastic Net regression in a constrained training period.
We show by simulating missingness in a testing period that GESS forecasts have
a significant bias that results in low predictive skill over the ten Beninese
stations. Our findings suggest that overall bias correction by Elastic Net and
Gaussian Process regression achieves superior skill relative to traditional
imputation by Random Forest, k-Nearest Neighbour, and GESS lookup. The findings
of this work provide a basis for integrating global GESS streamflow data into
operational early-warning decision-making systems (e.g., flood alert) in
countries vulnerable to drought and flooding due to extreme weather events.Comment: AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate
Challenges, Nov 17-19, 202
Open problems in causal structure learning: A case study of COVID-19 in the UK
Causal machine learning (ML) algorithms recover graphical structures that
tell us something about cause-and-effect relationships. The causal
representation praovided by these algorithms enables transparency and
explainability, which is necessary for decision making in critical real-world
problems. Yet, causal ML has had limited impact in practice compared to
associational ML. This paper investigates the challenges of causal ML with
application to COVID-19 UK pandemic data. We collate data from various public
sources and investigate what the various structure learning algorithms learn
from these data. We explore the impact of different data formats on algorithms
spanning different classes of learning, and assess the results produced by each
algorithm, and groups of algorithms, in terms of graphical structure, model
dimensionality, sensitivity analysis, confounding variables, predictive and
interventional inference. We use these results to highlight open problems in
causal structure learning and directions for future research. To facilitate
future work, we make all graphs, models, data sets, and source code publicly
available online
MphayaNER: Named Entity Recognition for Tshivenda
Named Entity Recognition (NER) plays a vital role in various Natural Language
Processing tasks such as information retrieval, text classification, and
question answering. However, NER can be challenging, especially in low-resource
languages with limited annotated datasets and tools. This paper adds to the
effort of addressing these challenges by introducing MphayaNER, the first
Tshivenda NER corpus in the news domain. We establish NER baselines by
\textit{fine-tuning} state-of-the-art models on MphayaNER. The study also
explores zero-shot transfer between Tshivenda and other related Bantu
languages, with chiShona and Kiswahili showing the best results. Augmenting
MphayaNER with chiShona data was also found to improve model performance
significantly. Both MphayaNER and the baseline models are made publicly
available.Comment: Accepted at AfricaNLP Workshop at ICLR 202
Bayesianska neuronnÀt för korttidsprognoser för vindkraft
In recent years, wind and other variable renewable energy sources have gained a rapidly increasing share of the global energy mix. In this context the greatest concern facing renewable energy sources like wind is the uncertainty in production volumes as their generation ability is inherently dependent on weather conditions. When providing forecasts for newly commissioned wind farms there is a limited amount of historical power production data, while the number of potential features from different weather forecast providers is vast. Bayesian regularization is therefore seen as a possible technique for reducing model overfitting problems that may arise. This thesis investigates Bayesian Neural Networks in one-hour and day-ahead forecasting of wind power generation. Initial results show that Bayesian Neural Networks display equivalent predictive performance to Neural Networks trained by Maximum Likelihood in both one-hour and day ahead forecasting. Models selected using maximum evidence were found to have statistically significant lower test error performance compared to those selected based on minimum test error. Further results show that the Bayesian Framework is able to identify irrelevant features through Automatic Relevance Determination, though not resulting in a statistically significant error reduction in predictiveperformance in one-hour ahead forecasting. In day-ahead forecasting removing irrelevant features based on Automatic Relevance Determination is found to yield statistically significant improvements in test error.Under de senaste Ären har vind och andra variabla förnybara energikÀllor fÄtt en snabbtökande andel av den globala energiandelen. I detta sammanhang Àr den största oron förförnybara energikÀllors produktionsvolymer vindosÀkerheten, eftersom kraftverkens generationsförmÄga i sig Àr beroende av vÀderförhÄllandena. Vid prognoser för nybyggdavindkraftverk finns en begrÀnsad mÀngd historisk kraftproduktionsdata, medan antaletpotentiella mÀtvÀrden frÄn olika vÀderprognosleverantörer Àr stor. Bayesian regulariseringses dÀrför som en möjlig metod för att minska problem med den överanpassning avmodellerna som kan uppstÄ.Denna avhandling undersöker Bayesianska Neurala NÀtverk (BNN) för prognosticeringen timme och en dag framÄt av vindkraftproduktion. Resultat visar att BNN gerekvivalent prediktiv prestanda jÀmfört med neurala nÀtverk bildade med anvÀndandeav Maximum-likelihood för prognoser för en timme och dagsprognoser. Modeller somvalts med anvÀndning av maximum evidence visade sig ha statistiskt signifikant lÀgretestfelprestanda jÀmfört med de som valts utifrÄn minimaltestfel. Ytterligare resultatvisar att ett Bayesianskt ramverk kan identifiera irrelevanta sÀrdrag genom automatiskrelevansbestÀmning. För prognoser för en timme framÄt resulterar detta emellertid intei en statistiskt signifikant felreduktion i prediktiv prestanda. För 1-dagarsprognoser, nÀrvi avlÀgsnar irrelevanta funktioner baserade pÄ automatisk relevans, fÄs dock statistisktsignifikanta förbÀttringar av testfel
Parameter inference using probabilistic techniques
Abstract: Complex non-linear prediction systems have become ubiquitous in numerous decision making and other socio-technical systems. In recent years, the increased adoption and use of these complex non-linear systems has been dominated by universal approximators such as neural networks and Gaussian Processes. These systemsâ applications span a large number of critical domains, including transportation, drug design, law enforcement, financial services, energy planning, and pandemic forecasting. The aforementioned critical nature of the application domains necessitates the need to study the inference methods for training or calibration of these systemsâ parameters. Further to this, inference methods coupled with estimators of the uncertainty around the systemâs predictions and measures of the relative influence of its inputs aid in managing the very high societal risks associated with incorrect predictions. This thesis investigates probabilistic parameter inference methods that provide both the required uncertainty and relevance measures. We first introduce Metropolis Hastings (MH) and Hybrid Monte Carlo (HMC) methods for parameter inference in Bayesian Neural Networks (BNNs) with applications in credit risk modelling and South African wind energy resource planning. We further utilise a Separable Shadow Hamiltonian Hybrid Monte Carlo (S2HMC) method for the first time in the inference of BNN parameters. S2HMC addresses traditional MCMC methodsâ discretisation constraints by using a perturbed Hamiltonian, which is conserved at a higher-order by the numerical integration scheme. Experimental results on wind energy and credit datasets find that S2HMC yields higher effective sample sizes than the competing Hybrid Monte Carlo (HMC). The predictive performance of S2HMC and HMC based BNNs is found to be similar. We thirdly perform hierarchical inference for BNN parameters by combining the S2HMC sampler with Gibbs sampling of hyperparameters for Automatic Relevance Determination (ARD). A generalisable ARD committee framework is introduced to synthesise various samplerâs ARD outputs into robust feature selections. Experimental results show that this ARD committee approach selects features of high predictive information value. Further, the results show that dimensionality reduction performed through this approach improves the sampling performance of samplers which suffer from random walk behaviour such as Metropolis-Hastings (MH). The thesis also addresses predictive distribution calibration pathologies of the existing product of Gaussian Process expert models. We introduce a solution to the predictive dominance of uninformed experts through expert combination via theWasserstein Barycenter and sparsity control through tempered softmax weightings. These proposals are empirically shown to outperform other product of experts (PoE) methods. The proposed PoE are also found to outperform BNNs on wind speed forecasting regression tasks. Finally, the thesis provides a Bayesian inference approach to change point determination in the spreading rates of the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in South Africa. This approach is a first in literature, probabilistically principled method for quantifying the relative efficacy of the various South African government interventions to slow the spread of SARS-CoV-2.Ph.D. (Electrical and Electronic Engineering