16,019 research outputs found
Fast Exact Bayesian Inference for Sparse Signals in the Normal Sequence Model
We consider exact algorithms for Bayesian inference with model selection
priors (including spike-and-slab priors) in the sparse normal sequence model.
Because the best existing exact algorithm becomes numerically unstable for
sample sizes over n=500, there has been much attention for alternative
approaches like approximate algorithms (Gibbs sampling, variational Bayes,
etc.), shrinkage priors (e.g. the Horseshoe prior and the Spike-and-Slab LASSO)
or empirical Bayesian methods. However, by introducing algorithmic ideas from
online sequential prediction, we show that exact calculations are feasible for
much larger sample sizes: for general model selection priors we reach n=25000,
and for certain spike-and-slab priors we can easily reach n=100000. We further
prove a de Finetti-like result for finite sample sizes that characterizes
exactly which model selection priors can be expressed as spike-and-slab priors.
The computational speed and numerical accuracy of the proposed methods are
demonstrated in experiments on simulated data, on a differential gene
expression data set, and to compare the effect of multiple hyper-parameter
settings in the beta-binomial prior. In our experimental evaluation we compute
guaranteed bounds on the numerical accuracy of all new algorithms, which shows
that the proposed methods are numerically reliable whereas an alternative based
on long division is not
Discretized conformal prediction for efficient distribution-free inference
In regression problems where there is no known true underlying model,
conformal prediction methods enable prediction intervals to be constructed
without any assumptions on the distribution of the underlying data, except that
the training and test data are assumed to be exchangeable. However, these
methods bear a heavy computational cost-and, to be carried out exactly, the
regression algorithm would need to be fitted infinitely many times. In
practice, the conformal prediction method is run by simply considering only a
finite grid of finely spaced values for the response variable. This paper
develops discretized conformal prediction algorithms that are guaranteed to
cover the target value with the desired probability, and that offer a tradeoff
between computational cost and prediction accuracy
A Process to Implement an Artificial Neural Network and Association Rules Techniques to Improve Asset Performance and Energy Efficiency
In this paper, we address the problem of asset performance monitoring, with the intention
of both detecting any potential reliability problem and predicting any loss of energy consumption
e ciency. This is an important concern for many industries and utilities with very intensive
capitalization in very long-lasting assets. To overcome this problem, in this paper we propose an
approach to combine an Artificial Neural Network (ANN) with Data Mining (DM) tools, specifically
with Association Rule (AR) Mining. The combination of these two techniques can now be done
using software which can handle large volumes of data (big data), but the process still needs to
ensure that the required amount of data will be available during the assets’ life cycle and that its
quality is acceptable. The combination of these two techniques in the proposed sequence di ers
from previous works found in the literature, giving researchers new options to face the problem.
Practical implementation of the proposed approach may lead to novel predictive maintenance models
(emerging predictive analytics) that may detect with unprecedented precision any asset’s lack of
performance and help manage assets’ O&M accordingly. The approach is illustrated using specific
examples where asset performance monitoring is rather complex under normal operational conditions.Ministerio de EconomÃa y Competitividad DPI2015-70842-
Automated data pre-processing via meta-learning
The final publication is available at link.springer.comA data mining algorithm may perform differently on datasets with different characteristics, e.g., it might perform better on a dataset with continuous attributes rather than with categorical attributes, or the other way around.
As a matter of fact, a dataset usually needs to be pre-processed. Taking into account all the possible pre-processing operators, there exists a staggeringly large number of alternatives and nonexperienced users become overwhelmed.
We show that this problem can be addressed by an automated approach, leveraging ideas from metalearning.
Specifically, we consider a wide range of data pre-processing techniques and a set of data mining algorithms. For each data mining algorithm and selected dataset, we are able to predict the transformations that improve the result
of the algorithm on the respective dataset. Our approach will help non-expert users to more effectively identify the transformations appropriate to their applications, and hence to achieve improved results.Peer ReviewedPostprint (published version
- …