33,240 research outputs found
Two Procedures for Robust Monitoring of Probability Distributions of Economic Data Streams induced by Depth Functions
Data streams (streaming data) consist of transiently observed, evolving in
time, multidimensional data sequences that challenge our computational and/or
inferential capabilities. In this paper we propose user friendly approaches for
robust monitoring of selected properties of unconditional and conditional
distribution of the stream basing on depth functions. Our proposals are robust
to a small fraction of outliers and/or inliers but sensitive to a regime change
of the stream at the same time. Their implementations are available in our free
R package DepthProc.Comment: Operations Research and Decisions, vol. 25, No. 1, 201
Selective machine learning of doubly robust functionals
While model selection is a well-studied topic in parametric and nonparametric
regression or density estimation, selection of possibly high-dimensional
nuisance parameters in semiparametric problems is far less developed. In this
paper, we propose a selective machine learning framework for making inferences
about a finite-dimensional functional defined on a semiparametric model, when
the latter admits a doubly robust estimating function and several candidate
machine learning algorithms are available for estimating the nuisance
parameters. We introduce two new selection criteria for bias reduction in
estimating the functional of interest, each based on a novel definition of
pseudo-risk for the functional that embodies the double robustness property and
thus is used to select the pair of learners that is nearest to fulfilling this
property. We establish an oracle property for a multi-fold cross-validation
version of the new selection criteria which states that our empirical criteria
perform nearly as well as an oracle with a priori knowledge of the pseudo-risk
for each pair of candidate learners. We also describe a smooth approximation to
the selection criteria which allows for valid post-selection inference.
Finally, we apply the approach to model selection of a semiparametric estimator
of average treatment effect given an ensemble of candidate machine learners to
account for confounding in an observational study
Robust variable screening for regression using factor profiling
Sure Independence Screening is a fast procedure for variable selection in
ultra-high dimensional regression analysis. Unfortunately, its performance
greatly deteriorates with increasing dependence among the predictors. To solve
this issue, Factor Profiled Sure Independence Screening (FPSIS) models the
correlation structure of the predictor variables, assuming that it can be
represented by a few latent factors. The correlations can then be profiled out
by projecting the data onto the orthogonal complement of the subspace spanned
by these factors. However, neither of these methods can handle the presence of
outliers in the data. Therefore, we propose a robust screening method which
uses a least trimmed squares method to estimate the latent factors and the
factor profiled variables. Variable screening is then performed on factor
profiled variables by using regression MM-estimators. Different types of
outliers in this model and their roles in variable screening are studied. Both
simulation studies and a real data analysis show that the proposed robust
procedure has good performance on clean data and outperforms the two nonrobust
methods on contaminated data
Improved model identification for nonlinear systems using a random subsampling and multifold modelling (RSMM) approach
In nonlinear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of āhold-outā or āsplit-sampleā data partitioning
method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To
overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. Firstly, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect
significant model terms and identify a common model structure that fits all the K datasets using a new
proposed common model selection approach, called the multiple orthogonal search algorithm. Finally,
estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
Distributed state estimation in sensor networks with randomly occurring nonlinearities subject to time delays
This is the post-print version of the Article. The official published version can be accessed from the links below - Copyright @ 2012 ACM.This article is concerned with a new distributed state estimation problem for a class of dynamical systems in sensor networks. The target plant is described by a set of differential equations disturbed by a Brownian motion and randomly occurring nonlinearities (RONs) subject to time delays. The RONs are investigated here to reflect network-induced randomly occurring regulation of the delayed states on the current ones. Through available measurement output transmitted from the sensors, a distributed state estimator is designed to estimate the states of the target system, where each sensor can communicate with the neighboring sensors according to the given topology by means of a directed graph. The state estimation is carried out in a distributed way and is therefore applicable to online application. By resorting to the Lyapunov functional combined with stochastic analysis techniques, several delay-dependent criteria are established that not only ensure the estimation error to be globally asymptotically stable in the mean square, but also guarantee the existence of the desired estimator gains that can then be explicitly expressed when certain matrix inequalities are solved. A numerical example is given to verify the designed distributed state estimators.This work was supported in part by the National Natural Science Foundation of China under Grants 61028008, 60804028 and 61174136, the Qing Lan Project of Jiangsu Province of China, the Project sponsored by SRF for ROCS of SEM of China, the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grant GR/S27658/01, the Royal Society of the UK,
and the Alexander von Humboldt Foundation of Germany
Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach
In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
- ā¦