51 research outputs found
Bayesian sequential design of computer experiments to estimate reliable sets
We consider an unknown multivariate function representing a system-such as a
complex numerical simulator-taking both deterministic and uncertain inputs. Our
objective is to estimate the set of deterministic inputs leading to outputs
whose probability (with respect to the distribution of the uncertain inputs) to
belong to a given set is controlled by a given threshold. To solve this
problem, we propose a Bayesian strategy based on the Stepwise Uncertainty
Reduction (SUR) principle to sequentially choose the points at which the
function should be evaluated to approximate the set of interest. We illustrate
its performance and interest in several numerical experiments
Some Bayesian insights for statistical tolerance analysis
Functionality of assembled products mostly rely on the ability of the manufacturer to produce under some quality requirements. Parts which do not meet these requirements represent a manufacturing waste which can be at the origin of substantial losses in terms of money and credibility. Quality con- trol and defect detection are two keypoints of predictive process management. At the design stage, a statistical tolerance analysis can be performed to predict the process quality. This imply to estimate a so-called defect probability which quantifies the probability that the final assembly does not meet functional requirements. In general, this quantity depends on a number of process specifications (toler- ances, capability levels) set a priori by the manufacturer, but also on the monitoring of the process itself since the process parameters (mean shift value and standard deviation) vary statistically for different batches. In this paper, we give an alternative point of view on an existing method, namely the Advanced Probability-based Tolerance Analysis of products (APTA), proposed in literature to estimate the defect probability. This method, originally relying on a double-loop sampling strategy, is revisited within the Bayesian framework, and an augmented approach is proposed to estimate the defect probability in a more efficient way. The efficiency of the augmented approach for solving tolerancing problems with APTA is illustrated on a linear reference test-case
Recommended from our members
The future of sensitivity analysis: an essential discipline for systems modeling and policy support
Sensitivity analysis (SA) is en route to becoming an integral part of mathematical modeling. The tremendous potential benefits of SA are, however, yet to be fully realized, both for advancing mechanistic and data-driven modeling of human and natural systems, and in support of decision making. In this perspective paper, a multidisciplinary group of researchers and practitioners revisit the current status of SA, and outline research challenges in regard to both theoretical frameworks and their applications to solve real-world problems. Six areas are discussed that warrant further attention, including (1) structuring and standardizing SA as a discipline, (2) realizing the untapped potential of SA for systems modeling, (3) addressing the computational burden of SA, (4) progressing SA in the context of machine learning, (5) clarifying the relationship and role of SA to uncertainty quantification, and (6) evolving the use of SA in support of decision making. An outlook for the future of SA is provided that underlines how SA must underpin a wide variety of activities to better serve science and society
Analyse de sensibilité fiabiliste avec prise en compte d'incertitudes sur le modèle probabiliste - Application aux systèmes aérospatiaux
Aerospace systems are complex engineering systems for which reliability has to be guaranteed at an early design phase, especially regarding the potential tremendous damage and costs that could be induced by any failure. Moreover, the management of various sources of uncertainties, either impacting the behavior of systems (“aleatory” uncertainty due to natural variability of physical phenomena) and/or their modeling and simulation (“epistemic” uncertainty due to lack of knowledge and modeling choices) is a cornerstone for reliability assessment of those systems. Thus, uncertainty quantification and its underlying methodology consists in several phases. Firstly, one needs to model and propagate uncertainties through the computer model which is considered as a “black-box”. Secondly, a relevant quantity of interest regarding the goal of the study, e.g., a failure probability here, has to be estimated. For highly-safe systems, the failure probability which is sought is very low and may be costly-to-estimate. Thirdly, a sensitivity analysis of the quantity of interest can be set up in order to better identify and rank the influential sources of uncertainties in input. Therefore, the probabilistic modeling of input variables (epistemic uncertainty) might strongly influence the value of the failure probability estimate obtained during the reliability analysis. A deeper investigation about the robustness of the probability estimate regarding such a type of uncertainty has to be conducted. This thesis addresses the problem of taking probabilistic modeling uncertainty of the stochastic inputs into account. Within the probabilistic framework, a “bi-level” input uncertainty has to be modeled and propagated all along the different steps of the uncertainty quantification methodology. In this thesis, the uncertainties are modeled within a Bayesian framework in which the lack of knowledge about the distribution parameters is characterized by the choice of a prior probability density function. During a first phase, after the propagation of the bi-level input uncertainty, the predictive failure probability is estimated and used as the current reliability measure instead of the standard failure probability. Then, during a second phase, a local reliability-oriented sensitivity analysis based on the use of score functions is achieved to study the impact of hyper-parameterization of the prior on the predictive failure probability estimate. Finally, in a last step, a global reliability-oriented sensitivity analysis based on Sobol indices on the indicator function adapted to the bi-level input uncertainty is proposed. All the proposed methodologies are tested and challenged on a representative industrial aerospace test-case simulating the fallout of an expendable space launcher.Les systèmes aérospatiaux sont des systèmes complexes dont la fiabilité doit être garantie dès la phase de conception au regard des coûts liés aux dégâts gravissimes qu’engendrerait la moindre défaillance. En outre, la prise en compte des incertitudes influant sur le comportement (incertitudes dites « aléatoires » car liées à la variabilité naturelle de certains phénomènes) et la modélisation de ces systèmes (incertitudes dites « épistémiques » car liées au manque de connaissance et aux choix de modélisation) permet d’estimer la fiabilité de tels systèmes et demeure un enjeu crucial en ingénierie. Ainsi, la quantification des incertitudes et sa méthodologie associée consiste, dans un premier temps, à modéliser puis propager ces incertitudes à travers le modèle numérique considéré comme une « boîte-noire ». Dès lors, le but est d’estimer une quantité d’intérêt fiabiliste telle qu’une probabilité de défaillance. Pour les systèmes hautement fiables, la probabilité de défaillance recherchée est très faible, et peut être très coûteuse à estimer. D’autre part, une analyse de sensibilité de la quantité d’intérêt vis-à-vis des incertitudes en entrée peut être réalisée afin de mieux identifier et hiérarchiser l’influence des différentes sources d’incertitudes. Ainsi, la modélisation probabiliste des variables d’entrée (incertitude épistémique) peut jouer un rôle prépondérant dans la valeur de la probabilité obtenue. Une analyse plus profonde de l’impact de ce type d’incertitude doit être menée afin de donner une plus grande confiance dans la fiabilité estimée. Cette thèse traite de la prise en compte de la méconnaissance du modèle probabiliste des entrées stochastiques du modèle. Dans un cadre probabiliste, un « double niveau » d’incertitudes (aléatoires/épistémiques) doit être modélisé puis propagé à travers l’ensemble des étapes de la méthodologie de quantification des incertitudes. Dans cette thèse, le traitement des incertitudes est effectué dans un cadre bayésien où la méconnaissance sur les paramètres de distribution des variables d‘entrée est caractérisée par une densité a priori. Dans un premier temps, après propagation du double niveau d’incertitudes, la probabilité de défaillance prédictive est utilisée comme mesure de substitution à la probabilité de défaillance classique. Dans un deuxième temps, une analyse de sensibilité locale à base de score functions de cette probabilité de défaillance prédictive vis-à-vis des hyper-paramètres de loi de probabilité des variables d’entrée est proposée. Enfin, une analyse de sensibilité globale à base d’indices de Sobol appliqués à la variable binaire qu’est l’indicatrice de défaillance est réalisée. L’ensemble des méthodes proposées dans cette thèse est appliqué à un cas industriel de retombée d’un étage de lanceur
Reliability-oriented sensitivity analysis under probabilistic model uncertainty – Application to aerospace systems
Les systèmes aérospatiaux sont des systèmes complexes dont la fiabilité doit être garantie dès la phase de conception au regard des coûts liés aux dégâts gravissimes qu’engendrerait la moindre défaillance. En outre, la prise en compte des incertitudes influant sur le comportement (incertitudes dites « aléatoires » car liées à la variabilité naturelle de certains phénomènes) et la modélisation de ces systèmes (incertitudes dites « épistémiques » car liées au manque de connaissance et aux choix de modélisation) permet d’estimer la fiabilité de tels systèmes et demeure un enjeu crucial en ingénierie. Ainsi, la quantification des incertitudes et sa méthodologie associée consiste, dans un premier temps, à modéliser puis propager ces incertitudes à travers le modèle numérique considéré comme une « boîte-noire ». Dès lors, le but est d’estimer une quantité d’intérêt fiabiliste telle qu’une probabilité de défaillance. Pour les systèmes hautement fiables, la probabilité de défaillance recherchée est très faible, et peut être très coûteuse à estimer. D’autre part, une analyse de sensibilité de la quantité d’intérêt vis-à-vis des incertitudes en entrée peut être réalisée afin de mieux identifier et hiérarchiser l’influence des différentes sources d’incertitudes. Ainsi, la modélisation probabiliste des variables d’entrée (incertitude épistémique) peut jouer un rôle prépondérant dans la valeur de la probabilité obtenue. Une analyse plus profonde de l’impact de ce type d’incertitude doit être menée afin de donner une plus grande confiance dans la fiabilité estimée. Cette thèse traite de la prise en compte de la méconnaissance du modèle probabiliste des entrées stochastiques du modèle. Dans un cadre probabiliste, un « double niveau » d’incertitudes (aléatoires/épistémiques) doit être modélisé puis propagé à travers l’ensemble des étapes de la méthodologie de quantification des incertitudes. Dans cette thèse, le traitement des incertitudes est effectué dans un cadre bayésien où la méconnaissance sur les paramètres de distribution des variables d‘entrée est caractérisée par une densité a priori. Dans un premier temps, après propagation du double niveau d’incertitudes, la probabilité de défaillance prédictive est utilisée comme mesure de substitution à la probabilité de défaillance classique. Dans un deuxième temps, une analyse de sensibilité locale à base de score functions de cette probabilité de défaillance prédictive vis-à-vis des hyper-paramètres de loi de probabilité des variables d’entrée est proposée. Enfin, une analyse de sensibilité globale à base d’indices de Sobol appliqués à la variable binaire qu’est l’indicatrice de défaillance est réalisée. L’ensemble des méthodes proposées dans cette thèse est appliqué à un cas industriel de retombée d’un étage de lanceur.Aerospace systems are complex engineering systems for which reliability has to be guaranteed at an early design phase, especially regarding the potential tremendous damage and costs that could be induced by any failure. Moreover, the management of various sources of uncertainties, either impacting the behavior of systems (“aleatory” uncertainty due to natural variability of physical phenomena) and/or their modeling and simulation (“epistemic” uncertainty due to lack of knowledge and modeling choices) is a cornerstone for reliability assessment of those systems. Thus, uncertainty quantification and its underlying methodology consists in several phases. Firstly, one needs to model and propagate uncertainties through the computer model which is considered as a “black-box”. Secondly, a relevant quantity of interest regarding the goal of the study, e.g., a failure probability here, has to be estimated. For highly-safe systems, the failure probability which is sought is very low and may be costly-to-estimate. Thirdly, a sensitivity analysis of the quantity of interest can be set up in order to better identify and rank the influential sources of uncertainties in input. Therefore, the probabilistic modeling of input variables (epistemic uncertainty) might strongly influence the value of the failure probability estimate obtained during the reliability analysis. A deeper investigation about the robustness of the probability estimate regarding such a type of uncertainty has to be conducted. This thesis addresses the problem of taking probabilistic modeling uncertainty of the stochastic inputs into account. Within the probabilistic framework, a “bi-level” input uncertainty has to be modeled and propagated all along the different steps of the uncertainty quantification methodology. In this thesis, the uncertainties are modeled within a Bayesian framework in which the lack of knowledge about the distribution parameters is characterized by the choice of a prior probability density function. During a first phase, after the propagation of the bi-level input uncertainty, the predictive failure probability is estimated and used as the current reliability measure instead of the standard failure probability. Then, during a second phase, a local reliability-oriented sensitivity analysis based on the use of score functions is achieved to study the impact of hyper-parameterization of the prior on the predictive failure probability estimate. Finally, in a last step, a global reliability-oriented sensitivity analysis based on Sobol indices on the indicator function adapted to the bi-level input uncertainty is proposed. All the proposed methodologies are tested and challenged on a representative industrial aerospace test-case simulating the fallout of an expendable space launcher
Statistical developments for target and conditional sensitivity analysis: application on safety studies for nuclear reactor
Numerical simulators are essential for understanding, modeling and predicting physical phenomena. However, the available information about some of the input variables is often limited or uncertain. Global sensitivity analysis (GSA) then aims at determining (qualitatively or quantitatively) how the variability of the inputs affects the model output. However, from reliability and risk management perspectives, GSA might be insufficient to capture the influence of the inputs on a restricted domain of the output (e.g., a distribution tail). To remedy this, we define and use in this work target (TSA) and conditional sensitivity analysis (CSA), which aim respectively at measuring the influence of the inputs on the occurrence of the critical event, and on the output within the critical domain (ignoring what happens outside). As illustrated in the applications, these two notions can widely differ. From existing GSA measures, we propose new operational tools for TSA and CSA. We first focus on the popular Sobol indices and show their practical limitations for both TSA and CSA. Then, the Hilbert-Schmidt Independence Criterion (HSIC), a dependence measure recently adapted for GSA purposes and well-suited for small datasets, is considered. TSA and CSA adaptations of Sobol and HSIC indices, and associated statistical estimators, are defined. Alternative CSA Sobol indices are thus defined to overcome the dependence of inputs induced by the conditioning. Moreover, to cope with the loss of information (especially when the critical domain is associated to a low probability) and reduce the variability of estimators, transformation of the output using weight functions is also proposed. These new TSA and CSA tools are tested and compared on analytical examples. The efficiency of HSIC-based indices clearly appear, as well as the relevancy of smooth relaxation. Finally, these latter indices are applied and interpreted on a nuclear engineering use case simulating a severe accidental scenario on a pressurized water reactor
Variance-based importance measures for machine learning model interpretability
International audienceMachine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of “interpretability” remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called “importance measures”) whose aim is to quantify the impact of each predictor on the statistical model’s output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties.Les algorithmes statistiques d'apprentissage automatique (ou machine learning) connaissent un essor sans précédent dans le monde industriel, notamment pour l'aide à la décision en ingénierie des systèmes critiques. Toutefois, leur manque d'"interprétabilité" est un verrou à lever afin de rendre ces outils intelligibles et auditables. Ce papier vise à dresser une cartographie de certaines métriques d'interprétabilité (appelées "mesures d'importance") dont le but est de quantifier l'impact de chaque prédicteur sur la variance de la sortie du modèle statistique. Il est montré que le choix d'une métrique pertinente doit être guidé par les contraintes inhérentes aux données et au modèle considéré (caractère linéaire ou non du phénomène d'intérêt, dimension du problème, dépendance des prédicteurs) et par le type d'étude que l'utilisateur souhaite mener (détecter les variables influentes, les hiérarchiser, etc.). Enfin, ces métriques sont estimées et analysées sur un jeu de données public afin d'illustrer certaines de leurs propriétés théoriques et empiriques. Keywords-apprentissage statistique, interprétabilité, analyse de sensibilité, effets de Shapley, indices de Sobol' Abstract-Machine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of "interpretability" remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called "importance measures") whose aim is to quantify the impact of each predictor on the statistical model's output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties
Variance-based importance measures for machine learning model interpretability
Machine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of “interpretability” remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called “importance measures”) whose aim is to quantify the impact of each predictor on the statistical model’s output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties.Les algorithmes statistiques d'apprentissage automatique (ou machine learning) connaissent un essor sans précédent dans le monde industriel, notamment pour l'aide à la décision en ingénierie des systèmes critiques. Toutefois, leur manque d'"interprétabilité" est un verrou à lever afin de rendre ces outils intelligibles et auditables. Ce papier vise à dresser une cartographie de certaines métriques d'interprétabilité (appelées "mesures d'importance") dont le but est de quantifier l'impact de chaque prédicteur sur la variance de la sortie du modèle statistique. Il est montré que le choix d'une métrique pertinente doit être guidé par les contraintes inhérentes aux données et au modèle considéré (caractère linéaire ou non du phénomène d'intérêt, dimension du problème, dépendance des prédicteurs) et par le type d'étude que l'utilisateur souhaite mener (détecter les variables influentes, les hiérarchiser, etc.). Enfin, ces métriques sont estimées et analysées sur un jeu de données public afin d'illustrer certaines de leurs propriétés théoriques et empiriques. Keywords-apprentissage statistique, interprétabilité, analyse de sensibilité, effets de Shapley, indices de Sobol' Abstract-Machine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of "interpretability" remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called "importance measures") whose aim is to quantify the impact of each predictor on the statistical model's output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties
Variance-based importance measures for machine learning model interpretability
International audienceMachine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of "interpretability" remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called "importance measures") whose aim is to quantify the impact of each predictor on the statistical model's output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties
Variance-based importance measures for machine learning model interpretability
International audienceMachine learning algorithms benefit from an unprecedented boost in the industrial world, in particular in support of decision-making for critical systems. However, their lack of "interpretability" remains a challenge to leverage in order to make these tools fully intelligible and auditable. This paper aims to track and synthesize of a panel of interpretability metrics (called "importance measures") whose aim is to quantify the impact of each predictor on the statistical model's output variance. It is shown that the choice of a relevant metric has to be guided by proper constraints imposed by the data and the considered model (linear vs. nonlinear phenomenon of interest, input dimension, input dependency) together with taking the type of study the user wants to perform into consideration (detect influential variables, rank them, etc.). Finally, these metrics are estimated and analyzed on a public dataset so as to illustrate some of their theoretical and empirical properties
- …