1,077 research outputs found

    Optimal model-free prediction from multivariate time series

    Get PDF
    © 2015 American Physical Society.Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal preselection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used suboptimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Niño Southern Oscillation

    An Information Approach to Regularization Parameter Selection for the Solution of Ill-Posed Inverse Problems Under Model Misspecification

    Get PDF
    Engineering problems are often ill-posed, i.e. cannot be solved by conventional data-driven methods such as parametric linear and nonlinear regression or neural networks. A method of regularization that is used for the solution of ill-posed problems requires an a priori choice of the regularization parameter. Several regularization parameter selection methods have been proposed in the literature, yet, none is resistant to model misspecification. Since almost all models are incorrectly or approximately specified, misspecification resistance is a valuable option for engineering applications. Each data-driven method is based on a statistical procedure which can perform well on one data set and can fail on other. Therefore, another useful feature of a data- driven method is robustness. This dissertation proposes a methodology of developing misspecification-resistant and robust regularization parameter selection methods through the use of the information complexity approach. The original contribution of the dissertation to the field of ill-posed inverse problems in engineering is a new robust regularization parameter selection method. This method is misspecification-resistant, i.e. it works consistently when the model is misspecified. The method also improves upon the information-based regularization parameter selection methods by correcting inadequate penalization of estimation inaccuracy through the use of the information complexity framework. Such an improvement makes the proposed regularization parameter selection method robust and reduces the risk of obtaining grossly underregularized solutions. A method of misspecification detection is proposed based on the discrepancy between the proposed regularization parameter selection method and its correctly specified version. A detected misspecification indicates that the model may be inadequate for the particular problem and should be revised. The superior performance of the proposed regularization parameter selection method is demonstrated by practical examples. Data for the examples are from Carolina Power & Light\u27s Crystal River Nuclear Power Plant and a TVA fossil power plant. The results of applying the proposed regularization parameter selection method to the data demonstrate that the method is robust, i.e. does not produce grossly underregularized solutions, and performs well when the model is misspecified. This enables one to implement the proposed regularization parameter selection method in autonomous diagnostic and monitoring systems

    Extending Mixture of Experts Model to Investigate Heterogeneity of Trajectories: When, Where and How to Add Which Covariates

    Full text link
    Researchers are usually interested in examining the impact of covariates when separating heterogeneous samples into latent classes that are more homogeneous. The majority of theoretical and empirical studies with such aims have focused on identifying covariates as predictors of class membership in the structural equation modeling framework. In other words, the covariates only indirectly affect the sample heterogeneity. However, the covariates' influence on between-individual differences can also be direct. This article presents a mixture model that investigates covariates to explain within-cluster and between-cluster heterogeneity simultaneously, known as a mixture-of-experts (MoE) model. This study aims to extend the MoE framework to investigate heterogeneity in nonlinear trajectories: to identify latent classes, covariates as predictors to clusters, and covariates that explain within-cluster differences in change patterns over time. Our simulation studies demonstrate that the proposed model generally estimates the parameters unbiasedly, precisely and exhibits appropriate empirical coverage for a nominal 95% confidence interval. This study also proposes implementing structural equation model forests to shrink the covariate space of the proposed mixture model. We illustrate how to select covariates and construct the proposed model with longitudinal mathematics achievement data. Additionally, we demonstrate that the proposed mixture model can be further extended in the structural equation modeling framework by allowing the covariates that have direct effects to be time-varying.Comment: Draft version 1.7, 06/01/2021. This paper has not been peer reviewed. Please do not copy or cite without author's permissio

    Assessing model performance for counterfactual predictions

    Full text link
    Counterfactual prediction methods are required when a model will be deployed in a setting where treatment policies differ from the setting where the model was developed, or when the prediction question is explicitly counterfactual. However, estimating and evaluating counterfactual prediction models is challenging because one does not observe the full set of potential outcomes for all individuals. Here, we discuss how to tailor a model to a counterfactual estimand, how to assess the model's performance, and how to perform model and tuning parameter selection. We also provide identifiability results for measures of performance for a potentially misspecified counterfactual prediction model based on training and test data from the same (factual) source population. Last, we illustrate the methods using simulation and apply them to the task of developing a statin-na\"{i}ve risk prediction model for cardiovascular disease

    Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It

    Get PDF
    We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sample size increases. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the Safe Bayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates as soon the standard posterior is not `cumulatively concentrated', and its results on our data are quite encouraging.Comment: 70 pages, 20 figure
    • …
    corecore