1,258 research outputs found

    An Information Approach to Regularization Parameter Selection for the Solution of Ill-Posed Inverse Problems Under Model Misspecification

    Get PDF
    Engineering problems are often ill-posed, i.e. cannot be solved by conventional data-driven methods such as parametric linear and nonlinear regression or neural networks. A method of regularization that is used for the solution of ill-posed problems requires an a priori choice of the regularization parameter. Several regularization parameter selection methods have been proposed in the literature, yet, none is resistant to model misspecification. Since almost all models are incorrectly or approximately specified, misspecification resistance is a valuable option for engineering applications. Each data-driven method is based on a statistical procedure which can perform well on one data set and can fail on other. Therefore, another useful feature of a data- driven method is robustness. This dissertation proposes a methodology of developing misspecification-resistant and robust regularization parameter selection methods through the use of the information complexity approach. The original contribution of the dissertation to the field of ill-posed inverse problems in engineering is a new robust regularization parameter selection method. This method is misspecification-resistant, i.e. it works consistently when the model is misspecified. The method also improves upon the information-based regularization parameter selection methods by correcting inadequate penalization of estimation inaccuracy through the use of the information complexity framework. Such an improvement makes the proposed regularization parameter selection method robust and reduces the risk of obtaining grossly underregularized solutions. A method of misspecification detection is proposed based on the discrepancy between the proposed regularization parameter selection method and its correctly specified version. A detected misspecification indicates that the model may be inadequate for the particular problem and should be revised. The superior performance of the proposed regularization parameter selection method is demonstrated by practical examples. Data for the examples are from Carolina Power & Light\u27s Crystal River Nuclear Power Plant and a TVA fossil power plant. The results of applying the proposed regularization parameter selection method to the data demonstrate that the method is robust, i.e. does not produce grossly underregularized solutions, and performs well when the model is misspecified. This enables one to implement the proposed regularization parameter selection method in autonomous diagnostic and monitoring systems

    Robust and Misspecification Resistant Model Selection in Regression Models with Information Complexity and Genetic Algorithms

    Get PDF
    In this dissertation, we develop novel computationally effiient model subset selection methods for multiple and multivariate linear regression models which are both robust and misspecification resistant. Our approach is to use a three-way hybrid method which employs the information theoretic measure of complexity (ICOMP) computed on robust M-estimators as model subset selection criteria, integrated with genetic algorithms (GA) as the subset model searching engine. Despite the rich literature on the robust estimation techniques, bridging the theoretical and applied aspects related to robust model subset selection has been somewhat neglected. A few information criteria in the multiple regression literature are robust. However, none of them is model misspecification resistant and none of them could be generalized to the misspecified multivariate regression. In this dissertation, we introduce for the first time both robust and misspecification resistant information complexity (ICOMP) criterion to fill in the gap in the literature. More specifically in multiple linear regression, we introduce robust M-estimators with misspecification resistant ICOMP and use the new information criterion as the fitness fuction in GA to carry out the model subset selection. For multivariate linear regression, we derive the two-stage robust Mahalanobis distance (RMD) estimator and introduce this RMD estimator in the computation of information criteria. The new information criteria are used as the fitness function in the GA to perform the model subset selection. Comparative studies on the simulated data for both multiple and multivariate regression show that the robust and misspecification resistant ICOMP outperforms the other robust information criteria and the non-robust ICOMP computed using OLS (or MLE) when the data contain outliers and error terms in the model deviate from a normal distribution. Compared with the all possible model subset selection, GA combined with the robust and misspecification resistant infromation criteria is proved to be an effective method which can quickly find the a near subset, if not the best, without having to search the whole subset model space

    Sparse Model Selection using Information Complexity

    Get PDF
    This dissertation studies and uses the application of information complexity to statistical model selection through three different projects. Specifically, we design statistical models that incorporate sparsity features to make the models more explanatory and computationally efficient. In the first project, we propose a Sparse Bridge Regression model for variable selection when the number of variables is much greater than the number of observations if model misspecification occurs. The model is demonstrated to have excellent explanatory power in high-dimensional data analysis through numerical simulations and real-world data analysis. The second project proposes a novel hybrid modeling method that utilizes a mixture of sparse principal component regression (MIX-SPCR) to segment high-dimensional time series data. Using the MIX-SPCR model, we empirically analyze the S\&P 500 index data (from 1999 to 2019) and identify two key change points. The third project investigates the use of nonlinear features in the Sparse Kernel Factor Analysis (SKFA) method to derive the information criterion. Using a variety of wide datasets, we demonstrate the benefits of SKFA in the nonlinear representation and classification of data. The results obtained show the flexibility and the utility of information complexity in such data modeling problems

    Designs for Stated Preference Experiments

    Get PDF
    We explore the use of different strategies for the construction of optimal choice experiments and their impact on the overall efficiency of the resulting design. We then evaluate how these choice designs meet the desired characteristics of optimal choice designs (orthogonality, level balance, utility balance and minimum level overlap). We further explore the feasibility of using entropy as a secondary measure of design optimality. We find that current algorithms afford little flexibility for using this secondary measure. We further study the impact of misspecification of the assumed parameter values used in creation of optimal choice designs. We find that the impact of misspecification varies widely based on the discrepancy between the true and assumed parameter values. Further we find that entropy becomes a more feasible secondary measure of design optimality if one considers the potential of misspecification of the values. Current design and analysis strategies for stated preference experiments assume that compensatory decisions are made. We consider how different decision strategies may be represented through manipulating the assumed parameter values used in creating the choice designs. In this context, the consequences of misspecification of the decision strategy are also evaluated. Given the large prevalence of no-choice choices in stated preference experiments, we study how different measures of choice complexity impact the selection of the no-choice alternative. We conclude by suggesting a comprehensive strategy that should be followed in the creation of choice designs

    Generalization of the causal effect of a given regimen in a network meta-analysis using AIPTW and TMLE

    Full text link
    Cette mémoire vise à développer une méthode de pondération par l’inverse de le probabilité de traitement (Augmented Inverse Probability of Treatment Weighting; AIPTW) et estimation par maximum de vraisemblance ciblée (Targeted Maximum Likelihood Estimation; TMLE) dans le contexte d'une méta-analyse en réseau avec données individuelles (Individual Patient Data Network Meta-Analysis; IPD-NMA) avec données observationnelles. Nous proposons également des méthodes pour estimer le score de propension généralisé (Generalized Propensity Score; GPS) pour finalement estimer l'effet causal d'une combinaison donnée de traitements (un régime) interprété à partir de d’une population globale. Cette recherche a été motivée par une mise à jour récente des données de patients atteints de la tuberculose multirésistante (Multidrug-Resistant Tuberculosis ; MDR-TB), une maladie infectieuse respiratoire causée par le bacillus mycobactérie avec un taux de mortalité élevé. Une compléxité notable de notre scénario est que toutes les régimes de traitements n'ont pas été observés dans toutes les études. L’inférence causale est définie comme l'étude de l'effet des traitements sur un résultat. Bien que les études cliniques randomisées sont l'étalon-or pour l'investigation des causes et effets, en raison de certaines limitations, leur utilisation n'est pas toujours faisable. Ainsi, l’analyse de données observationnelles est proposée. Donc, il est important de développer des méthodes qui nous permettent d'utiliser les informations provenant des données observationnelles. L'utilisation des informations provenant de plusieurs études individuelles nous permet d'évaluer les associations entre les traitements et les résultats qui sont spécifiques aux sous-populations. Aussi, une méta-analyse en réseau nous permet comparer plusieurs régimes au lieu de seulement deux. Nous estimons le taux de succès d’un régime donné à partir d'un ensemble d'études dans lesquelles le régime était disponible, puis le généralisons à l'ensemble de la population source. La théorie et les résultats d’une étude de simulation démontre que les méthodes développées sont doublement robustes. Cependant, TMLE démontre plus de robustesse, en particulier lorsqu’une méthode nouvellement proposée pour estimer le GPS est utilisée. Le résultat de l'application donne des estimations d’un taux de succès de traitement généralisé entre 50 à 61 % pour le régime {Pyrazinamide,Kanamycin,Ofloxacin,Ethionamide,Cyloserine} tandis que le taux observé de l’ensemble des données était de 59 %.This thesis aims for developing Augmented Inverse Probability of Treatment Weighting (AIPTW) and Targeted Maximum Likelihood Estimation (TMLE) in the setting of Individual Patient Data Network Meta-Analysis (IPD-NMA) of observational data and propose a method to estimate the Generalized Propensity Score (GPS) to eventually estimate the causal effect of a given combination of treatments (a regimen) and generalize it to a global population. This research was motivated by a recent update on IPD_NMA of Multidrug-Resistant Tuberculosis (MDR-TB) - a respiratory infectious disease caused by bacillus mycobacterium with a high rate of mortality - where not all the regimens observed in all the studies. Although Randomized Controlled Trials (RCTs) are known to be the gold standard in investigating cause-and-effect including in causal inference (defined as the study of the effect of treatments on an outcome), but because of some known limitations using them is not always feasible. Thus, observational data are being proposed. Therefore, developing methods that enable us to use the information from observational data is important. In addition, using the information coming from individual studies allows us to evaluate associations between treatments and outcome which are specific to subpopulations. Also, a network meta-analysis allows us to study the effect of multiple treatments instead of two. We estimate the rate of treatment success for a given regimen from a set of studies where the regimen was available, and then generalize it to the whole network. The simulation result shows that the developed methods are doubly robust, however TMLE shows more robustness specially when the new proposed approach to estimate the GPS is being used. The application result shows a range of 50-61% for the generalized success rate of regimen {Pyrazinamide,Kanamycin,Ofloxacin,Ethionamide,Cyloserine} while the observed rate was 59% from multiple regimens
    • …
    corecore