33,240 research outputs found

    Two Procedures for Robust Monitoring of Probability Distributions of Economic Data Streams induced by Depth Functions

    Full text link
    Data streams (streaming data) consist of transiently observed, evolving in time, multidimensional data sequences that challenge our computational and/or inferential capabilities. In this paper we propose user friendly approaches for robust monitoring of selected properties of unconditional and conditional distribution of the stream basing on depth functions. Our proposals are robust to a small fraction of outliers and/or inliers but sensitive to a regime change of the stream at the same time. Their implementations are available in our free R package DepthProc.Comment: Operations Research and Decisions, vol. 25, No. 1, 201

    Selective machine learning of doubly robust functionals

    Full text link
    While model selection is a well-studied topic in parametric and nonparametric regression or density estimation, selection of possibly high-dimensional nuisance parameters in semiparametric problems is far less developed. In this paper, we propose a selective machine learning framework for making inferences about a finite-dimensional functional defined on a semiparametric model, when the latter admits a doubly robust estimating function and several candidate machine learning algorithms are available for estimating the nuisance parameters. We introduce two new selection criteria for bias reduction in estimating the functional of interest, each based on a novel definition of pseudo-risk for the functional that embodies the double robustness property and thus is used to select the pair of learners that is nearest to fulfilling this property. We establish an oracle property for a multi-fold cross-validation version of the new selection criteria which states that our empirical criteria perform nearly as well as an oracle with a priori knowledge of the pseudo-risk for each pair of candidate learners. We also describe a smooth approximation to the selection criteria which allows for valid post-selection inference. Finally, we apply the approach to model selection of a semiparametric estimator of average treatment effect given an ensemble of candidate machine learners to account for confounding in an observational study

    Robust variable screening for regression using factor profiling

    Full text link
    Sure Independence Screening is a fast procedure for variable selection in ultra-high dimensional regression analysis. Unfortunately, its performance greatly deteriorates with increasing dependence among the predictors. To solve this issue, Factor Profiled Sure Independence Screening (FPSIS) models the correlation structure of the predictor variables, assuming that it can be represented by a few latent factors. The correlations can then be profiled out by projecting the data onto the orthogonal complement of the subspace spanned by these factors. However, neither of these methods can handle the presence of outliers in the data. Therefore, we propose a robust screening method which uses a least trimmed squares method to estimate the latent factors and the factor profiled variables. Variable screening is then performed on factor profiled variables by using regression MM-estimators. Different types of outliers in this model and their roles in variable screening are studied. Both simulation studies and a real data analysis show that the proposed robust procedure has good performance on clean data and outperforms the two nonrobust methods on contaminated data

    Improved model identification for nonlinear systems using a random subsampling and multifold modelling (RSMM) approach

    Get PDF
    In nonlinear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of ā€˜hold-outā€™ or ā€˜split-sampleā€™ data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. Firstly, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance

    Distributed state estimation in sensor networks with randomly occurring nonlinearities subject to time delays

    Get PDF
    This is the post-print version of the Article. The official published version can be accessed from the links below - Copyright @ 2012 ACM.This article is concerned with a new distributed state estimation problem for a class of dynamical systems in sensor networks. The target plant is described by a set of differential equations disturbed by a Brownian motion and randomly occurring nonlinearities (RONs) subject to time delays. The RONs are investigated here to reflect network-induced randomly occurring regulation of the delayed states on the current ones. Through available measurement output transmitted from the sensors, a distributed state estimator is designed to estimate the states of the target system, where each sensor can communicate with the neighboring sensors according to the given topology by means of a directed graph. The state estimation is carried out in a distributed way and is therefore applicable to online application. By resorting to the Lyapunov functional combined with stochastic analysis techniques, several delay-dependent criteria are established that not only ensure the estimation error to be globally asymptotically stable in the mean square, but also guarantee the existence of the desired estimator gains that can then be explicitly expressed when certain matrix inequalities are solved. A numerical example is given to verify the designed distributed state estimators.This work was supported in part by the National Natural Science Foundation of China under Grants 61028008, 60804028 and 61174136, the Qing Lan Project of Jiangsu Province of China, the Project sponsored by SRF for ROCS of SEM of China, the Engineering and Physical Sciences Research Council (EPSRC) of the UK under Grant GR/S27658/01, the Royal Society of the UK, and the Alexander von Humboldt Foundation of Germany

    Improved model identification for non-linear systems using a random subsampling and multifold modelling (RSMM) approach

    Get PDF
    In non-linear system identification, the available observed data are conventionally partitioned into two parts: the training data that are used for model identification and the test data that are used for model performance testing. This sort of 'hold-out' or 'split-sample' data partitioning method is convenient and the associated model identification procedure is in general easy to implement. The resultant model obtained from such a once-partitioned single training dataset, however, may occasionally lack robustness and generalisation to represent future unseen data, because the performance of the identified model may be highly dependent on how the data partition is made. To overcome the drawback of the hold-out data partitioning method, this study presents a new random subsampling and multifold modelling (RSMM) approach to produce less biased or preferably unbiased models. The basic idea and the associated procedure are as follows. First, generate K training datasets (and also K validation datasets), using a K-fold random subsampling method. Secondly, detect significant model terms and identify a common model structure that fits all the K datasets using a new proposed common model selection approach, called the multiple orthogonal search algorithm. Finally, estimate and refine the model parameters for the identified common-structured model using a multifold parameter estimation method. The proposed method can produce robust models with better generalisation performance
    • ā€¦
    corecore