59 research outputs found

    Parametric versus nonparametric: the fitness coefficient

    Full text link
    The fitness coefficient, introduced in this paper, results from a competition between parametric and nonparametric density estimators within the likelihood of the data. As illustrated on several real datasets, the fitness coefficient generally agrees with p-values but is easier to compute and interpret. Namely, the fitness coefficient can be interpreted as the proportion of data coming from the parametric model. Moreover, the fitness coefficient can be used to build a semiparamteric compromise which improves inference over the parametric and nonparametric approaches. From a theoretical perspective, the fitness coefficient is shown to converge in probability to one if the model is true and to zero if the model is false. From a practical perspective, the utility of the fitness coefficient is illustrated on real and simulated datasets

    An optimal tradeoff between explorations and repetitions in global sensitivity analysis for stochastic computer models

    Get PDF
    Global sensitivity analysis often accompanies computer modeling to understand what are the important factors of a model of interest. In particular, Sobol indices, naturally estimated by Monte-Carlo methods, permit to quantify the contribution of the inputs to the variability of the output. However, stochastic computer models raise difficulties. There is no unique definition of Sobol indices and their estimation is difficult because a good balance between repetitions of the computer code and explorations of the input space must be found. The problem of finding an optimal tradeoff between explorations and repetitions is addressed. Two Sobol indices are considered, their estimators constructed and their asymptotic properties established. To find an optimal tradeoff between repetitions and explorations, a tractable error criterion, which is small when the inputs of the model are ranked correctly, is built and minimized under a fixed computing budget. Then, Sobol estimates based on the balance found beforehand are produced. Convergence rates are given and it is shown that this method is asymptotically oracle. Numerical tests and a sensitivity analysis of a Susceptible-Infectious-Recovered (SIR) model are performed

    Construction et estimation de copules en grande dimension

    Get PDF
    In the last decades, copulas have been more and more used in statistical modeling. Their popularity owes much to the fact that they allow to separate the analysis of the margins from the analysis of the dependence structure induced by the underlying distribution. This renders easier the modeling of non Gaussian distributions, and, in particular, it allows to take into account non linear dependencies between random variables. Finance and hydrology are two examples of scientific fields where the use of copulas is nowadays standard. However, while many bivariate families exist in the literature, multivariate/high dimensional copulas are much more difficult to construct. This thesis presents three contributions to copula modeling and inference, with an emphasis on high dimensional problems. The first model writes as a product of bivariate copulas and is underlain by a tree structure where each edge represents a bivariate copula. Hence, we are able to model different pairs with different dependence properties. The second one is a factor model built on a nonparametric class of bivariate copulas. It exhibits a good balance between tractability and flexibility. This thesis also deals with the parametric inference of copula models in general. Indeed, the asymptotic properties of a weighted least-squares estimator based on dependence coefficients are established. The models and methods have been applied to hydrological data (flow rates and rain falls).Ces dernières décennies, nous avons assisté à l'émergence du concept de copule en modélisation statistique. Les copules permettent de faire une analyse séparée des marges et de la structure de dépendance induite par une distribution statistique. Cette séparation facilite l'incorporation de lois non gaussiennes, et en particulier la prise en compte des dépendances non linéaires entre les variables aléatoires. La finance et l'hydrologie sont deux exemples de sciences où les copules sont très utilisées. Cependant, bien qu'il existe beaucoup de familles de copules bivariées, le choix reste limité en plus grande dimension: la construction de copules multivariées/en grande dimension reste un problème ouvert aujourd'hui. Cette thèse présente trois contributions à la modélisation et à l'inférence de copules en grande dimension. Le premier modèle proposé s'écrit comme un produit de copules bivariées, où chaque copule bivariée se combine aux autres via un graphe en arbre. Elle permet de prendre en compte les différents degrés de dépendance entre les différentes paires. La seconde copule est un modèle à facteurs basé sur une classe nonparamétrique de copules bivariées. Elle permet d'obtenir un bon équilibre entre flexibilité et facilité d'utilisation. Cette thèse traite également de l'inférence paramétrique de copules dans le cas général, en établissant les propriétés asymptotiques d'un estimateur des moindres carrés pondérés basé sur les coefficients de dépendance. Les modèles et méthodes proposés sont appliqués sur des données hydrologiques (pluies et débits de rivières)

    A parsimonious multivariate copula for tail dependence modeling

    Get PDF
    International audienceCopulas are increasingly studied both in theory and practice as they are a convenient tool to construct multivariate distribution functions. However the material essentially covers the bi-variate case while in applications the number of variables is much higher. Furthermore, when one wants to take into account tail dependence, a desirable property is to have enough flexibility in the tails while avoiding the exponential growth of the number of parameters. We propose in this communication a one-factor model which exhibits this feature

    Inference on extremal dependence in the domain of attraction of a structured H\"usler-Reiss distribution motivated by a Markov tree with latent variables

    Full text link
    A Markov tree is a probabilistic graphical model for a random vector indexed by the nodes of an undirected tree encoding conditional independence relations between variables. One possible limit distribution of partial maxima of samples from such a Markov tree is a max-stable H\"usler-Reiss distribution whose parameter matrix inherits its structure from the tree, each edge contributing one free dependence parameter. Our central assumption is that, upon marginal standardization, the data-generating distribution is in the max-domain of attraction of the said H\"usler-Reiss distribution, an assumption much weaker than the one that data are generated according to a graphical model. Even if some of the variables are unobservable (latent), we show that the underlying model parameters are still identifiable if and only if every node corresponding to a latent variable has degree at least three. Three estimation procedures, based on the method of moments, maximum composite likelihood, and pairwise extremal coefficients, are proposed for usage on multivariate peaks over thresholds data when some variables are latent. A typical application is a river network in the form of a tree where, on some locations, no data are available. We illustrate the model and the identifiability criterion on a data set of high water levels on the Seine, France, with two latent variables. The structured H\"usler-Reiss distribution is found to fit the observed extremal dependence patterns well. The parameters being identifiable we are able to quantify tail dependence between locations for which there are no data.Comment: 31 pages, 17 figure

    Constraining kernel estimators in semiparametric copula mixture models

    Get PDF
    This paper presents a novel algorithm for performing inference and/or clustering in semiparametric copula-based mixture models. The algorithm replaces the standard kernel density estimator by a weighted version that permits to take into account the constraints put on the underlying marginal densities. Lower misclassification error rates and better estimates are obtained on simulations. The pointwise consistency of the weighted kernel density estimator is established under an assumption on the rate of convergence of the sample maximum

    Semaine d'Etude Mathématiques et Entreprises 6 : Analyse statistique des défauts en électronique analogique

    Get PDF
    Nous nous intéressons à des données issues de mesures de tensions sur des circuits électroniques analogiques. Plus précisément, il s'agit de proposer une analyse de courbes représentant l'évolution en fonction du temps des tensions en différents nœuds d'un circuit électronique. Notre objectif est de proposer une analyse automatisée de la qualité des courbes. Plus précisément, nous proposons ici des méthodes statistiques d'analyse de données capable de : -- Identifier d'éventuels patterns dans les courbes (classification), -- Isoler les courbes présentant des "anomalies" (détection de courbes suspectes)

    Weighted least-squares inference for multivariate copulas based on dependence coefficients

    Get PDF
    l'auteur Gildas Mazo actuellement à l'INRA - Centre de Jouy-en-Josas - Unité MaIAGEInternational audienceIn this paper, we address the issue of estimating the parameters of general multivariate copulas, that is, copulas whose partial derivatives may not exist. To this aim, we consider a weighted least-squares estimator based on dependence coefficients, and establish its consistency and asymptotic normality. The estimator's performance on finite samples is illustrated on simulations and a real dataset

    Construction and estimation of high-dimensional copulas

    No full text
    Ces dernières décennies, nous avons assisté à l'émergence du concept de copule en modélisation statistique. Les copules permettent de faire une analyse séparée des marges et de la structure de dépendance induite par une distribution statistique. Cette séparation facilite l'incorporation de lois non gaussiennes, et en particulier la prise en compte des dépendances non linéaires entre les variables aléatoires. La finance et l'hydrologie sont deux exemples de sciences où les copules sont très utilisées. Cependant, bien qu'il existe beaucoup de familles de copules bivariées, le choix reste limité en plus grande dimension: la construction de copules multivariées/en grande dimension reste un problème ouvert aujourd'hui. Cette thèse présente trois contributions à la modélisation et à l'inférence de copules en grande dimension. Le premier modèle proposé s'écrit comme un produit de copules bivariées, où chaque copule bivariée se combine aux autres via un graphe en arbre. Elle permet de prendre en compte les différents degrés de dépendance entre les différentes paires. La seconde copule est un modèle à facteurs basé sur une classe nonparamétrique de copules bivariées. Elle permet d'obtenir un bon équilibre entre flexibilité et facilité d'utilisation. Cette thèse traite également de l'inférence paramétrique de copules dans le cas général, en établissant les propriétés asymptotiques d'un estimateur des moindres carrés pondérés basé sur les coefficients de dépendance. Les modèles et méthodes proposés sont appliqués sur des données hydrologiques (pluies et débits de rivières).In the last decades, copulas have been more and more used in statistical modeling. Their popularity owes much to the fact that they allow to separate the analysis of the margins from the analysis of the dependence structure induced by the underlying distribution. This renders easier the modeling of non Gaussian distributions, and, in particular, it allows to take into account non linear dependencies between random variables. Finance and hydrology are two examples of scientific fields where the use of copulas is nowadays standard. However, while many bivariate families exist in the literature, multivariate/high dimensional copulas are much more difficult to construct. This thesis presents three contributions to copula modeling and inference, with an emphasis on high dimensional problems. The first model writes as a product of bivariate copulas and is underlain by a tree structure where each edge represents a bivariate copula. Hence, we are able to model different pairs with different dependence properties. The second one is a factor model built on a nonparametric class of bivariate copulas. It exhibits a good balance between tractability and flexibility. This thesis also deals with the parametric inference of copula models in general. Indeed, the asymptotic properties of a weighted least-squares estimator based on dependence coefficients are established. The models and methods have been applied to hydrological data (flow rates and rain falls)

    A semiparametric and location-shift copula-based mixture model

    No full text
    Modeling of distributions mixtures has rested on Gaussian distribu- tions and/or a conditional independence hypothesis for a long time. Only recently researchers have started to construct and study broader generic models without appealing to these hypotheses. Some of these extensions use copulas as a tool to build flexible models, as they permit to model the dependence and the marginal distributions separately. But this approach also has drawbacks. First, it increases much the number of choices the practitioner has to make, and second, marginal misspecification may loom on the horizon. This paper aims at overcoming these limitations by pre- senting a copula-based mixture model which is semiparametric. Thanks to a location-shift hypothesis, semiparametric estimation, also, is feasible, which allows for data adaptation without any modeling efforts
    corecore