56,147 research outputs found
Combining predictions from linear models when training and test inputs differ
Methods for combining predictions from different models in a supervised
learning setting must somehow estimate/predict the quality of a model's
predictions at unknown future inputs. Many of these methods (often implicitly)
make the assumption that the test inputs are identical to the training inputs,
which is seldom reasonable. By failing to take into account that prediction
will generally be harder for test inputs that did not occur in the training
set, this leads to the selection of too complex models. Based on a novel,
unbiased expression for KL divergence, we propose XAIC and its special case
FAIC as versions of AIC intended for prediction that use different degrees of
knowledge of the test inputs. Both methods substantially differ from and may
outperform all the known versions of AIC even when the training and test inputs
are iid, and are especially useful for deterministic inputs and under covariate
shift. Our experiments on linear models suggest that if the test and training
inputs differ substantially, then XAIC and FAIC predictively outperform AIC,
BIC and several other methods including Bayesian model averaging.Comment: 12 pages, 2 figures. To appear in Proceedings of the 30th Conference
on Uncertainty in Artificial Intelligence (UAI2014). This version includes
the supplementary material (regularity assumptions, proofs
Distributed Gaussian Processes
To scale Gaussian processes (GPs) to large data sets we introduce the robust
Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts
model for large-scale distributed GP regression. Unlike state-of-the-art sparse
GP approximations, the rBCM is conceptually simple and does not rely on
inducing or variational parameters. The key idea is to recursively distribute
computations to independent computational units and, subsequently, recombine
them to form an overall result. Efficient closed-form inference allows for
straightforward parallelisation and distributed computations with a small
memory footprint. The rBCM is independent of the computational graph and can be
used on heterogeneous computing infrastructures, ranging from laptops to
clusters. With sufficient computing resources our distributed GP model can
handle arbitrarily large data sets.Comment: 10 pages, 5 figures. Appears in Proceedings of ICML 201
Prediction of infectious disease epidemics via weighted density ensembles
Accurate and reliable predictions of infectious disease dynamics can be
valuable to public health organizations that plan interventions to decrease or
prevent disease transmission. A great variety of models have been developed for
this task, using different model structures, covariates, and targets for
prediction. Experience has shown that the performance of these models varies;
some tend to do better or worse in different seasons or at different points
within a season. Ensemble methods combine multiple models to obtain a single
prediction that leverages the strengths of each model. We considered a range of
ensemble methods that each form a predictive density for a target of interest
as a weighted sum of the predictive densities from component models. In the
simplest case, equal weight is assigned to each component model; in the most
complex case, the weights vary with the region, prediction target, week of the
season when the predictions are made, a measure of component model uncertainty,
and recent observations of disease incidence. We applied these methods to
predict measures of influenza season timing and severity in the United States,
both at the national and regional levels, using three component models. We
trained the models on retrospective predictions from 14 seasons (1997/1998 -
2010/2011) and evaluated each model's prospective, out-of-sample performance in
the five subsequent influenza seasons. In this test phase, the ensemble methods
showed overall performance that was similar to the best of the component
models, but offered more consistent performance across seasons than the
component models. Ensemble methods offer the potential to deliver more reliable
predictions to public health decision makers.Comment: 20 pages, 6 figure
Artificial intelligence in steam cracking modeling : a deep learning algorithm for detailed effluent prediction
Chemical processes can benefit tremendously from fast and accurate effluent composition prediction for plant design, control, and optimization. The Industry 4.0 revolution claims that by introducing machine learning into these fields, substantial economic and environmental gains can be achieved. The bottleneck for high-frequency optimization and process control is often the time necessary to perform the required detailed analyses of, for example, feed and product. To resolve these issues, a framework of four deep learning artificial neural networks (DL ANNs) has been developed for the largest chemicals production process-steam cracking. The proposed methodology allows both a detailed characterization of a naphtha feedstock and a detailed composition of the steam cracker effluent to be determined, based on a limited number of commercial naphtha indices and rapidly accessible process characteristics. The detailed characterization of a naphtha is predicted from three points on the boiling curve and paraffins, iso-paraffins, olefins, naphthenes, and aronatics (PIONA) characterization. If unavailable, the boiling points are also estimated. Even with estimated boiling points, the developed DL ANN outperforms several established methods such as maximization of Shannon entropy and traditional ANNs. For feedstock reconstruction, a mean absolute error (MAE) of 0.3 wt% is achieved on the test set, while the MAE of the effluent prediction is 0.1 wt%. When combining all networks-using the output of the previous as input to the next-the effluent MAE increases to 0.19 wt%. In addition to the high accuracy of the networks, a major benefit is the negligible computational cost required to obtain the predictions. On a standard Intel i7 processor, predictions are made in the order of milliseconds. Commercial software such as COILSIM1D performs slightly better in terms of accuracy, but the required central processing unit time per reaction is in the order of seconds. This tremendous speed-up and minimal accuracy loss make the presented framework highly suitable for the continuous monitoring of difficult-to-access process parameters and for the envisioned, high-frequency real-time optimization (RTO) strategy or process control. Nevertheless, the lack of a fundamental basis implies that fundamental understanding is almost completely lost, which is not always well-accepted by the engineering community. In addition, the performance of the developed networks drops significantly for naphthas that are highly dissimilar to those in the training set. (C) 2019 THE AUTHORS. Published by Elsevier LTD on behalf of Chinese Academy of Engineering and Higher Education Press Limited Company
Lightweight Probabilistic Deep Networks
Even though probabilistic treatments of neural networks have a long history,
they have not found widespread use in practice. Sampling approaches are often
too slow already for simple networks. The size of the inputs and the depth of
typical CNN architectures in computer vision only compound this problem.
Uncertainty in neural networks has thus been largely ignored in practice,
despite the fact that it may provide important information about the
reliability of predictions and the inner workings of the network. In this
paper, we introduce two lightweight approaches to making supervised learning
with probabilistic deep networks practical: First, we suggest probabilistic
output layers for classification and regression that require only minimal
changes to existing networks. Second, we employ assumed density filtering and
show that activation uncertainties can be propagated in a practical fashion
through the entire network, again with minor changes. Both probabilistic
networks retain the predictive power of the deterministic counterpart, but
yield uncertainties that correlate well with the empirical error induced by
their predictions. Moreover, the robustness to adversarial examples is
significantly increased.Comment: To appear at CVPR 201
- …