14 research outputs found

    Healing Products of Gaussian Processes

    Get PDF
    Gaussian processes (GPs) are nonparametric Bayesian models that have been applied to regression and classification problems. One of the approaches to alleviate their cubic training cost is the use of local GP experts trained on subsets of the data. In particular, product-of-expert models combine the predictive distributions of local experts through a tractable product operation. While these expert models allow for massively distributed computation, their predictions typically suffer from erratic behaviour of the mean or uncalibrated uncertainty quantification. By calibrating predictions via a tempered softmax weighting, we provide a solution to these problems for multiple product-of-expert models, including the generalised product of experts and the robust Bayesian committee machine. Furthermore, we leverage the optimal transport literature and propose a new product-of-expert model that combines predictions of local experts by computing their Wasserstein barycenter, which can be applied to both regression and classification.Comment: ICML 202

    Imputation of Missing Streamflow Data at Multiple Gauging Stations in Benin Republic

    Full text link
    Streamflow observation data is vital for flood monitoring, agricultural, and settlement planning. However, such streamflow data are commonly plagued with missing observations due to various causes such as harsh environmental conditions and constrained operational resources. This problem is often more pervasive in under-resourced areas such as Sub-Saharan Africa. In this work, we reconstruct streamflow time series data through bias correction of the GEOGloWS ECMWF streamflow service (GESS) forecasts at ten river gauging stations in Benin Republic. We perform bias correction by fitting Quantile Mapping, Gaussian Process, and Elastic Net regression in a constrained training period. We show by simulating missingness in a testing period that GESS forecasts have a significant bias that results in low predictive skill over the ten Beninese stations. Our findings suggest that overall bias correction by Elastic Net and Gaussian Process regression achieves superior skill relative to traditional imputation by Random Forest, k-Nearest Neighbour, and GESS lookup. The findings of this work provide a basis for integrating global GESS streamflow data into operational early-warning decision-making systems (e.g., flood alert) in countries vulnerable to drought and flooding due to extreme weather events.Comment: AAAI 2022 Fall Symposium: The Role of AI in Responding to Climate Challenges, Nov 17-19, 202

    Open problems in causal structure learning: A case study of COVID-19 in the UK

    Full text link
    Causal machine learning (ML) algorithms recover graphical structures that tell us something about cause-and-effect relationships. The causal representation praovided by these algorithms enables transparency and explainability, which is necessary for decision making in critical real-world problems. Yet, causal ML has had limited impact in practice compared to associational ML. This paper investigates the challenges of causal ML with application to COVID-19 UK pandemic data. We collate data from various public sources and investigate what the various structure learning algorithms learn from these data. We explore the impact of different data formats on algorithms spanning different classes of learning, and assess the results produced by each algorithm, and groups of algorithms, in terms of graphical structure, model dimensionality, sensitivity analysis, confounding variables, predictive and interventional inference. We use these results to highlight open problems in causal structure learning and directions for future research. To facilitate future work, we make all graphs, models, data sets, and source code publicly available online

    MphayaNER: Named Entity Recognition for Tshivenda

    Full text link
    Named Entity Recognition (NER) plays a vital role in various Natural Language Processing tasks such as information retrieval, text classification, and question answering. However, NER can be challenging, especially in low-resource languages with limited annotated datasets and tools. This paper adds to the effort of addressing these challenges by introducing MphayaNER, the first Tshivenda NER corpus in the news domain. We establish NER baselines by \textit{fine-tuning} state-of-the-art models on MphayaNER. The study also explores zero-shot transfer between Tshivenda and other related Bantu languages, with chiShona and Kiswahili showing the best results. Augmenting MphayaNER with chiShona data was also found to improve model performance significantly. Both MphayaNER and the baseline models are made publicly available.Comment: Accepted at AfricaNLP Workshop at ICLR 202

    Bayesianska neuronnÀt för korttidsprognoser för vindkraft

    No full text
    In recent years, wind and other variable renewable energy sources have gained a rapidly increasing share of the global energy mix. In this context the greatest concern facing renewable energy sources like wind is the uncertainty in production volumes as their generation ability is inherently dependent on weather conditions. When providing forecasts for newly commissioned wind farms there is a limited amount of historical power production data, while the number of potential features from different weather forecast providers is vast. Bayesian regularization is therefore seen as a possible technique for reducing model overfitting problems that may arise. This thesis investigates Bayesian Neural Networks in one-hour and day-ahead forecasting of wind power generation. Initial results show that Bayesian Neural Networks display equivalent predictive performance to Neural Networks trained by Maximum Likelihood in both one-hour and day ahead forecasting. Models selected using maximum evidence were found to have statistically significant lower test error performance compared to those selected based on minimum test error. Further results show that the Bayesian Framework is able to identify irrelevant features through Automatic Relevance Determination, though not resulting in a statistically significant error reduction in predictiveperformance in one-hour ahead forecasting. In day-ahead forecasting removing irrelevant features based on Automatic Relevance Determination is found to yield statistically significant improvements in test error.Under de senaste Ären har vind och andra variabla förnybara energikÀllor fÄtt en snabbtökande andel av den globala energiandelen. I detta sammanhang Àr den största oron förförnybara energikÀllors produktionsvolymer vindosÀkerheten, eftersom kraftverkens generationsförmÄga i sig Àr beroende av vÀderförhÄllandena. Vid prognoser för nybyggdavindkraftverk finns en begrÀnsad mÀngd historisk kraftproduktionsdata, medan antaletpotentiella mÀtvÀrden frÄn olika vÀderprognosleverantörer Àr stor. Bayesian regulariseringses dÀrför som en möjlig metod för att minska problem med den överanpassning avmodellerna som kan uppstÄ.Denna avhandling undersöker Bayesianska Neurala NÀtverk (BNN) för prognosticeringen timme och en dag framÄt av vindkraftproduktion. Resultat visar att BNN gerekvivalent prediktiv prestanda jÀmfört med neurala nÀtverk bildade med anvÀndandeav Maximum-likelihood för prognoser för en timme och dagsprognoser. Modeller somvalts med anvÀndning av maximum evidence visade sig ha statistiskt signifikant lÀgretestfelprestanda jÀmfört med de som valts utifrÄn minimaltestfel. Ytterligare resultatvisar att ett Bayesianskt ramverk kan identifiera irrelevanta sÀrdrag genom automatiskrelevansbestÀmning. För prognoser för en timme framÄt resulterar detta emellertid intei en statistiskt signifikant felreduktion i prediktiv prestanda. För 1-dagarsprognoser, nÀrvi avlÀgsnar irrelevanta funktioner baserade pÄ automatisk relevans, fÄs dock statistisktsignifikanta förbÀttringar av testfel

    Parameter inference using probabilistic techniques

    Get PDF
    Abstract: Complex non-linear prediction systems have become ubiquitous in numerous decision making and other socio-technical systems. In recent years, the increased adoption and use of these complex non-linear systems has been dominated by universal approximators such as neural networks and Gaussian Processes. These systems’ applications span a large number of critical domains, including transportation, drug design, law enforcement, financial services, energy planning, and pandemic forecasting. The aforementioned critical nature of the application domains necessitates the need to study the inference methods for training or calibration of these systems’ parameters. Further to this, inference methods coupled with estimators of the uncertainty around the system’s predictions and measures of the relative influence of its inputs aid in managing the very high societal risks associated with incorrect predictions. This thesis investigates probabilistic parameter inference methods that provide both the required uncertainty and relevance measures. We first introduce Metropolis Hastings (MH) and Hybrid Monte Carlo (HMC) methods for parameter inference in Bayesian Neural Networks (BNNs) with applications in credit risk modelling and South African wind energy resource planning. We further utilise a Separable Shadow Hamiltonian Hybrid Monte Carlo (S2HMC) method for the first time in the inference of BNN parameters. S2HMC addresses traditional MCMC methods’ discretisation constraints by using a perturbed Hamiltonian, which is conserved at a higher-order by the numerical integration scheme. Experimental results on wind energy and credit datasets find that S2HMC yields higher effective sample sizes than the competing Hybrid Monte Carlo (HMC). The predictive performance of S2HMC and HMC based BNNs is found to be similar. We thirdly perform hierarchical inference for BNN parameters by combining the S2HMC sampler with Gibbs sampling of hyperparameters for Automatic Relevance Determination (ARD). A generalisable ARD committee framework is introduced to synthesise various sampler’s ARD outputs into robust feature selections. Experimental results show that this ARD committee approach selects features of high predictive information value. Further, the results show that dimensionality reduction performed through this approach improves the sampling performance of samplers which suffer from random walk behaviour such as Metropolis-Hastings (MH). The thesis also addresses predictive distribution calibration pathologies of the existing product of Gaussian Process expert models. We introduce a solution to the predictive dominance of uninformed experts through expert combination via theWasserstein Barycenter and sparsity control through tempered softmax weightings. These proposals are empirically shown to outperform other product of experts (PoE) methods. The proposed PoE are also found to outperform BNNs on wind speed forecasting regression tasks. Finally, the thesis provides a Bayesian inference approach to change point determination in the spreading rates of the novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) in South Africa. This approach is a first in literature, probabilistically principled method for quantifying the relative efficacy of the various South African government interventions to slow the spread of SARS-CoV-2.Ph.D. (Electrical and Electronic Engineering
    corecore