Search CORE

33,240 research outputs found

Two Procedures for Robust Monitoring of Probability Distributions of Economic Data Streams induced by Depth Functions

Author: Kosiorowski Daniel
Publication venue
Publication date: 17/01/2015
Field of study

Data streams (streaming data) consist of transiently observed, evolving in time, multidimensional data sequences that challenge our computational and/or inferential capabilities. In this paper we propose user friendly approaches for robust monitoring of selected properties of unconditional and conditional distribution of the stream basing on depth functions. Our proposals are robust to a small fraction of outliers and/or inliers but sensitive to a regime change of the stream at the same time. Their implementations are available in our free R package DepthProc.Comment: Operations Research and Decisions, vol. 25, No. 1, 201

arXiv.org e-Print Archive

Directory of Open Access Journals

Selective machine learning of doubly robust functionals

Author: Cui Yifan
Tchetgen Eric Tchetgen
Publication venue
Publication date: 12/04/2021
Field of study

While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce two new selection criteria for bias reduction in estimating the functional of interest, each based on a novel definition of pseudo-risk for the functional that embodies the double robustness property and thus is used to select the pair of learners that is nearest to fulfilling this property. We establish an oracle property for a multi-fold cross-validation version of the new selection criteria which states that our empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each pair of candidate learners. We also describe a smooth approximation to the selection criteria which allows for valid post-selection inference. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study

arXiv.org e-Print Archive

Robust variable screening for regression using factor profiling

Author: Van Aelst Stefan
Wang Yixin
Publication venue
Publication date: 14/11/2018
Field of study

Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue, Factor Profiled Sure Independence Screening (FPSIS) models the correlation structure of the predictor variables, assuming that it can be represented by a few latent factors. The correlations can then be profiled out by projecting the data onto the orthogonal complement of the subspace spanned by these factors. However, neither of these methods can handle the presence of outliers in the data. Therefore, we propose a robust screening method which uses a least trimmed squares method to estimate the latent factors and the factor profiled variables. Variable screening is then performed on factor profiled variables by using regression MM-estimators. Different types of outliers in this model and their roles in variable screening are studied. Both simulation studies and a real data analysis show that the proposed robust procedure has good performance on clean data and outperforms the two nonrobust methods on contaminated data

arXiv.org e-Print Archive

Lirias

Improved model identification for nonlinear systems using a random subsampling and multifold modelling (RSMM) approach

Author: Billings S.A.
Wei H.L.
Publication venue: Automatic Control and Systems Engineering, University of Sheffield
Publication date: 01/09/2007
Field of study

In nonlinear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of ‘hold-out’ or ‘split-sample’ data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. Firstly, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance

White Rose Research Online

Distributed state estimation in sensor networks with randomly occurring nonlinearities subject to time delays

Author: Cattivelli F. S.
Cattivelli F. S.
Cattivelli F. S.
Kim J. H.
Mao X.
Naghshtabrizi P.
Olfati-Saber R.
Schuss Z.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2012
Field of study

This is the post-print version of the Article. The official published version can be accessed from the links below - Copyright @ 2012 ACM.This article is concerned with a new distributed state estimation problem for a class of dynamical systems in sensor networks. The target plant is described by a set of differential equations disturbed by a Brownian motion and randomly occurring nonlinearities (RONs) subject to time delays. The RONs are investigated here to reflect network-induced randomly occurring regulation of the delayed states on the current ones. Through available measurement output transmitted from the sensors, a distributed state estimator is designed to estimate the states of the target system, where each sensor can communicate with the neighboring sensors according to the given topology by means of a directed graph. The state estimation is carried out in a distributed way and is therefore applicable to online application. By resorting to the Lyapunov functional combined with stochastic analysis techniques, several delay-dependent criteria are established that not only ensure the estimation error to be globally asymptotically stable in the mean square, but also guarantee the existence of the desired estimator gains that can then be explicitly expressed when certain matrix inequalities are solved. A numerical example is given to verify the designed distributed state estimators.This work was supported in part by the National Natural Science Foundation of China under Grants 61028008, 60804028 and 61174136, the Qing Lan Project of Jiangsu Province of China, the Project sponsored by SRF for ROCS of SEM of China, the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grant GR/S27658/01, the Royal Society of the UK, and the Alexander von Humboldt Foundation of Germany

Crossref

Brunel University Research Archive

Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach

Author: Aguirre LA
Billings SA
Brown M
Chen S
Cherkassky V
Devijver PA
H.L. Wei
Hansen LK
Ljung L
Ljung L
Montgomery DC
Murray-Smith R
Pearson RK
S.A. Billings
Shao J
Shao J
Stone M
Tsang KM
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2009
Field of study

In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance

Crossref

White Rose Research Online