56,147 research outputs found

    Combining predictions from linear models when training and test inputs differ

    Get PDF
    Methods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the assumption that the test inputs are identical to the training inputs, which is seldom reasonable. By failing to take into account that prediction will generally be harder for test inputs that did not occur in the training set, this leads to the selection of too complex models. Based on a novel, unbiased expression for KL divergence, we propose XAIC and its special case FAIC as versions of AIC intended for prediction that use different degrees of knowledge of the test inputs. Both methods substantially differ from and may outperform all the known versions of AIC even when the training and test inputs are iid, and are especially useful for deterministic inputs and under covariate shift. Our experiments on linear models suggest that if the test and training inputs differ substantially, then XAIC and FAIC predictively outperform AIC, BIC and several other methods including Bayesian model averaging.Comment: 12 pages, 2 figures. To appear in Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI2014). This version includes the supplementary material (regularity assumptions, proofs

    Distributed Gaussian Processes

    Get PDF
    To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.Comment: 10 pages, 5 figures. Appears in Proceedings of ICML 201

    Prediction of infectious disease epidemics via weighted density ensembles

    Full text link
    Accurate and reliable predictions of infectious disease dynamics can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task, using different model structures, covariates, and targets for prediction. Experience has shown that the performance of these models varies; some tend to do better or worse in different seasons or at different points within a season. Ensemble methods combine multiple models to obtain a single prediction that leverages the strengths of each model. We considered a range of ensemble methods that each form a predictive density for a target of interest as a weighted sum of the predictive densities from component models. In the simplest case, equal weight is assigned to each component model; in the most complex case, the weights vary with the region, prediction target, week of the season when the predictions are made, a measure of component model uncertainty, and recent observations of disease incidence. We applied these methods to predict measures of influenza season timing and severity in the United States, both at the national and regional levels, using three component models. We trained the models on retrospective predictions from 14 seasons (1997/1998 - 2010/2011) and evaluated each model's prospective, out-of-sample performance in the five subsequent influenza seasons. In this test phase, the ensemble methods showed overall performance that was similar to the best of the component models, but offered more consistent performance across seasons than the component models. Ensemble methods offer the potential to deliver more reliable predictions to public health decision makers.Comment: 20 pages, 6 figure

    Artificial intelligence in steam cracking modeling : a deep learning algorithm for detailed effluent prediction

    Get PDF
    Chemical processes can benefit tremendously from fast and accurate effluent composition prediction for plant design, control, and optimization. The Industry 4.0 revolution claims that by introducing machine learning into these fields, substantial economic and environmental gains can be achieved. The bottleneck for high-frequency optimization and process control is often the time necessary to perform the required detailed analyses of, for example, feed and product. To resolve these issues, a framework of four deep learning artificial neural networks (DL ANNs) has been developed for the largest chemicals production process-steam cracking. The proposed methodology allows both a detailed characterization of a naphtha feedstock and a detailed composition of the steam cracker effluent to be determined, based on a limited number of commercial naphtha indices and rapidly accessible process characteristics. The detailed characterization of a naphtha is predicted from three points on the boiling curve and paraffins, iso-paraffins, olefins, naphthenes, and aronatics (PIONA) characterization. If unavailable, the boiling points are also estimated. Even with estimated boiling points, the developed DL ANN outperforms several established methods such as maximization of Shannon entropy and traditional ANNs. For feedstock reconstruction, a mean absolute error (MAE) of 0.3 wt% is achieved on the test set, while the MAE of the effluent prediction is 0.1 wt%. When combining all networks-using the output of the previous as input to the next-the effluent MAE increases to 0.19 wt%. In addition to the high accuracy of the networks, a major benefit is the negligible computational cost required to obtain the predictions. On a standard Intel i7 processor, predictions are made in the order of milliseconds. Commercial software such as COILSIM1D performs slightly better in terms of accuracy, but the required central processing unit time per reaction is in the order of seconds. This tremendous speed-up and minimal accuracy loss make the presented framework highly suitable for the continuous monitoring of difficult-to-access process parameters and for the envisioned, high-frequency real-time optimization (RTO) strategy or process control. Nevertheless, the lack of a fundamental basis implies that fundamental understanding is almost completely lost, which is not always well-accepted by the engineering community. In addition, the performance of the developed networks drops significantly for naphthas that are highly dissimilar to those in the training set. (C) 2019 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company

    Lightweight Probabilistic Deep Networks

    Full text link
    Even though probabilistic treatments of neural networks have a long history, they have not found widespread use in practice. Sampling approaches are often too slow already for simple networks. The size of the inputs and the depth of typical CNN architectures in computer vision only compound this problem. Uncertainty in neural networks has thus been largely ignored in practice, despite the fact that it may provide important information about the reliability of predictions and the inner workings of the network. In this paper, we introduce two lightweight approaches to making supervised learning with probabilistic deep networks practical: First, we suggest probabilistic output layers for classification and regression that require only minimal changes to existing networks. Second, we employ assumed density filtering and show that activation uncertainties can be propagated in a practical fashion through the entire network, again with minor changes. Both probabilistic networks retain the predictive power of the deterministic counterpart, but yield uncertainties that correlate well with the empirical error induced by their predictions. Moreover, the robustness to adversarial examples is significantly increased.Comment: To appear at CVPR 201
    corecore