320 research outputs found

    Valid and efficient imprecise-probabilistic inference with partial priors, III. Marginalization

    Full text link
    As Basu (1977) writes, "Eliminating nuisance parameters from a model is universally recognized as a major problem of statistics," but after more than 50 years since Basu wrote these words, the two mainstream schools of thought in statistics have yet to solve the problem. Fortunately, the two mainstream frameworks aren't the only options. This series of papers rigorously develops a new and very general inferential model (IM) framework for imprecise-probabilistic statistical inference that is provably valid and efficient, while simultaneously accommodating incomplete or partial prior information about the relevant unknowns when it's available. The present paper, Part III in the series, tackles the marginal inference problem. Part II showed that, for parametric models, the likelihood function naturally plays a central role and, here, when nuisance parameters are present, the same principles suggest that the profile likelihood is the key player. When the likelihood factors nicely, so that the interest and nuisance parameters are perfectly separated, the valid and efficient profile-based marginal IM solution is immediate. But even when the likelihood doesn't factor nicely, the same profile-based solution remains valid and leads to efficiency gains. This is demonstrated in several examples, including the famous Behrens--Fisher and gamma mean problems, where I claim the proposed IM solution is the best solution available. Remarkably, the same profiling-based construction offers validity guarantees in the prediction and non-parametric inference problems. Finally, I show how a broader view of this new IM construction can handle non-parametric inference on risk minimizers and makes a connection between non-parametric IMs and conformal prediction.Comment: Follow-up to arXiv:2211.14567. Feedback welcome at https://researchers.one/articles/23.09.0000

    Naive possibilistic classifiers for imprecise or uncertain numerical data

    Get PDF
    International audienceIn real-world problems, input data may be pervaded with uncertainty. In this paper, we investigate the behavior of naive possibilistic classifiers, as a counterpart to naive Bayesian ones, for dealing with classification tasks in the presence of uncertainty. For this purpose, we extend possibilistic classifiers, which have been recently adapted to numerical data, in order to cope with uncertainty in data representation. Here the possibility distributions that are used are supposed to encode the family of Gaussian probabilistic distributions that are compatible with the considered dataset. We consider two types of uncertainty: (i) the uncertainty associated with the class in the training set, which is modeled by a possibility distribution over class labels, and (ii) the imprecision pervading attribute values in the testing set represented under the form of intervals for continuous data. Moreover, the approach takes into account the uncertainty about the estimation of the Gaussian distribution parameters due to the limited amount of data available. We first adapt the possibilistic classification model, previously proposed for the certain case, in order to accommodate the uncertainty about class labels. Then, we propose an algorithm based on the extension principle to deal with imprecise attribute values. The experiments reported show the interest of possibilistic classifiers for handling uncertainty in data. In particular, the probability-to-possibility transform-based classifier shows a robust behavior when dealing with imperfect data

    Beyond probabilities: A possibilistic framework to interpret ensemble predictions and fuse imperfect sources of information

    Get PDF
    AbstractEnsemble forecasting is widely used in medium‐range weather predictions to account for the uncertainty that is inherent in the numerical prediction of high‐dimensional, nonlinear systems with high sensitivity to initial conditions. Ensemble forecasting allows one to sample possible future scenarios in a Monte‐Carlo‐like approximation through small strategical perturbations of the initial conditions, and in some cases stochastic parametrization schemes of the atmosphere–ocean dynamical equations. Results are generally interpreted in a probabilistic manner by turning the ensemble into a predictive probability distribution. Yet, due to model bias and dispersion errors, this interpretation is often not reliable and statistical postprocessing is needed to reach probabilistic calibration. This is all the more true for extreme events which, for dynamical reasons, cannot generally be associated with a significant density of ensemble members. In this work we propose a novel approach: a possibilistic interpretation of ensemble predictions, taking inspiration from possibility theory. This framework allows us to integrate in a consistent manner other imperfect sources of information, such as the insight about the system dynamics provided by the analogue method. We thereby show that probability distributions may not be the best way to extract the valuable information contained in ensemble prediction systems, especially for large lead times. Indeed, shifting to possibility theory provides more meaningful results without the need to resort to additional calibration, while maintaining or improving skills. Our approach is tested on an imperfect version of the Lorenz '96 model, and results for extreme event prediction are compared against those given by a standard probabilistic ensemble dressing

    The likelihood interpretation as the foundation of fuzzy set theory

    Get PDF
    In order to use fuzzy sets in real-world applications, an interpretation for the values of membership functions is needed. The history of fuzzy set theory shows that the interpretation in terms of statistical likelihood is very natural, although the connection between likelihood and probability can be misleading. In this paper, the likelihood interpretation of fuzzy sets is reviewed: it makes fuzzy data and fuzzy inferences perfectly compatible with standard statistical analyses, and sheds some light on the central role played by extension principle and α-cuts in fuzzy set theory. Furthermore, the likelihood interpretation justifies some of the combination rules of fuzzy set theory, including the product and minimum rules for the conjunction of fuzzy sets, as well as the probabilistic-sum and bounded-sum rules for the disjunction of fuzzy sets

    SPOCC: Scalable POssibilistic Classifier Combination -- toward robust aggregation of classifiers

    Full text link
    We investigate a problem in which each member of a group of learners is trained separately to solve the same classification task. Each learner has access to a training dataset (possibly with overlap across learners) but each trained classifier can be evaluated on a validation dataset. We propose a new approach to aggregate the learner predictions in the possibility theory framework. For each classifier prediction, we build a possibility distribution assessing how likely the classifier prediction is correct using frequentist probabilities estimated on the validation set. The possibility distributions are aggregated using an adaptive t-norm that can accommodate dependency and poor accuracy of the classifier predictions. We prove that the proposed approach possesses a number of desirable classifier combination robustness properties

    A novel framework for predicting patients at risk of readmission

    Get PDF
    Uncertainty in decision-making for patients’ risk of re-admission arises due to non-uniform data and lack of knowledge in health system variables. The knowledge of the impact of risk factors will provide clinicians better decision-making and in reducing the number of patients admitted to the hospital. Traditional approaches are not capable to account for the uncertain nature of risk of hospital re-admissions. More problems arise due to large amount of uncertain information. Patients can be at high, medium or low risk of re-admission, and these strata have ill-defined boundaries. We believe that our model that adapts fuzzy regression method will start a novel approach to handle uncertain data, uncertain relationships between health system variables and the risk of re-admission. Because of nature of ill-defined boundaries of risk bands, this approach does allow the clinicians to target individuals at boundaries. Targeting individuals at boundaries and providing them proper care may provide some ability to move patients from high risk to low risk band. In developing this algorithm, we aimed to help potential users to assess the patients for various risk score thresholds and avoid readmission of high risk patients with proper interventions. A model for predicting patients at high risk of re-admission will enable interventions to be targeted before costs have been incurred and health status have deteriorated. A risk score cut off level would flag patients and result in net savings where intervention costs are much higher per patient. Preventing hospital re-admissions is important for patients, and our algorithm may also impact hospital income

    A linear regression model for imprecise response

    Get PDF
    A linear regression model with imprecise response and p real explanatory variables is analyzed. The imprecision of the response variable is functionally described by means of certain kinds of fuzzy sets, the LR fuzzy sets. The LR fuzzy random variables are introduced to model usual random experiments when the characteristic observed on each result can be described with fuzzy numbers of a particular class, determined by 3 random values: the center, the left spread and the right spread. In fact, these constitute a natural generalization of the interval data. To deal with the estimation problem the space of the LR fuzzy numbers is proved to be isometric to a closed and convex cone of R3 with respect to a generalization of the most used metric for LR fuzzy numbers. The expression of the estimators in terms of moments is established, their limit distribution and asymptotic properties are analyzed and applied to the determination of confidence regions and hypothesis testing procedures. The results are illustrated by means of some case-studies. © 2010 Elsevier Inc. All rights reserved

    On Sharp Identification Regions for Regression Under Interval Data

    Get PDF
    The reliable analysis of interval data (coarsened data) is one of the most promising applications of imprecise probabilities in statistics. If one refrains from making untestable, and often materially unjustified, strong assumptions on the coarsening process, then the empirical distribution of the data is imprecise, and statistical models are, in Manski’s terms, partially identified. We first elaborate some subtle differences between two natural ways of handling interval data in the dependent variable of regression models, distinguishing between two different types of identification regions, called Sharp Marrow Region (SMR) and Sharp Collection Region (SCR) here. Focusing on the case of linear regression analysis, we then derive some fundamental geometrical properties of SMR and SCR, allowing a comparison of the regions and providing some guidelines for their canonical construction. Relying on the algebraic framework of adjunctions of two mappings between partially ordered sets, we characterize SMR as a right adjoint and as the monotone kernel of a criterion function based mapping, while SCR is indeed interpretable as the corresponding monotone hull. Finally we sketch some ideas on a compromise between SMR and SCR based on a set-domained loss function. This paper is an extended version of a shorter paper with the same title, that is conditionally accepted for publication in the Proceedings of the Eighth International Symposium on Imprecise Probability: Theories and Applications. In the present paper we added proofs and the seventh chapter with a small Monte-Carlo-Illustration, that would have made the original paper too long

    An informational distance for estimating the faithfulness of a possibility distribution, viewed as a family of probability distributions, with respect to data

    Get PDF
    International audienceAn acknowledged interpretation of possibility distributions in quantitative possibility theory is in terms of families of probabilities that are upper and lower bounded by the associated possibility and necessity measures. This paper proposes an informational distance function for possibility distributions that agrees with the above-mentioned view of possibility theory in the continuous and in the discrete cases. Especially, we show that, given a set of data following a probability distribution, the optimal possibility distribution with respect to our informational distance is the distribution obtained as the result of the probability-possibility transformation that agrees with the maximal specificity principle. It is also shown that when the optimal distribution is not available due to representation bias, maximizing this possibilistic informational distance provides more faithful results than approximating the probability distribution and then applying the probability-possibility transformation. We show that maximizing the possibilistic informational distance is equivalent to minimizing the squared distance to the unknown optimal possibility distribution. Two advantages of the proposed informational distance function is that (i) it does not require the knowledge of the shape of the probability distribution that underlies the data, and (ii) it amounts to sum up the elementary terms corresponding to the informational distance between the considered possibility distribution and each piece of data. We detail the particular case of triangular and trapezoidal possibility distributions and we show that any unimodal unknown probability distribution can be faithfully upper approximated by a triangular distribution obtained by optimizing the possibilistic informational distance
    corecore