35 research outputs found

    The normal distribution in some constrained sample spaces

    Get PDF
    Phenomena with a constrained sample space appear frequently in practice. This is the case, for example, with strictly positive data, or with compositional data, such as percentages or proportions. If the natural measure of difference is not the absolute one, simple algebraic properties show that it is more convenient to work with a geometry different from the usual Euclidean geometry in real space, and with a measure different from the usual Lebesgue measure, leading to alternative models that better fit the phenomenon under study. The general approach is presented and illustrated using the normal distribution, both on the positive real line and on the D-part simplex. The original ideas of McAlister in his introduction to the lognormal distribution in 1879, are recovered and updated.Peer Reviewe

    The normal distribution in some constrained sample spaces

    Get PDF
    Phenomena with a constrained sample space appear frequently in practice. This is the case, for example, with strictly positive data, or with compositional data, such as percentages or proportions. If the natural measure of difference is not the absolute one, simple algebraic properties show that it is more convenient to work with a geometry different from the usual Euclidean geometry in real space, and with a measure different from the usual Lebesgue measure, leading to alternative models that better fit the phenomenon under study. The general approach is presented and illustrated using the normal distribution, both on the positive real line and on the D-part simplex. The original ideas of McAlister in his introduction to the lognormal distribution in 1879, are recovered and updated

    Log-ratio methods in mixture models for compositional data sets

    Get PDF
    When traditional methods are applied to compositional data misleading and incoherent results could be obtained. Finite mixtures of multivariate distributions are becoming increasingly important nowadays. In this paper, traditional strategies to fit a mixture model into compositional data sets are revisited and the major difficulties are detailed. A new proposal using a mixture of distributions defined on orthonormal log-ratio coordinates is introduced. A real data set analysis is presented to illustrate and compare the different methodologies

    Empirical α-β runout modelling of snow avalanches in the Catalan Pyrenees.

    Get PDF
    A variation in the α−β model which is a regression model that allows a deterministic prediction of the extreme runout to be expected in a given path, was applied for calculating avalanche runout in the Catalan Pyrenees. Present knowledge of major avalanche activity in this region and current mapping tools were used. The model was derived using a dataset of 97 'extreme' avalanches that occurred from the end of 19th century to the beginning of 21st century. A multiple linear regression model was obtained using three independent variables: inclination of the avalanche path, horizontal length and area of the starting one, with a good fit of the function (R2 = 0.81). A larger starting zone increases the runout and a larger length of the path reduces the runout. The new updated equation predicts avalanche runout for a return period of ∼100 years. To study which terrain variables explain the extreme values of the avalanche dataset, a comparative analysis of variables that influence a longer or shorter runout was performed. The most extreme avalanches were treated. The size of the avalanche path and the aspect of the starting zone showed certain association between avalanches with longer or shorter runouts

    On the interpretation of differences between groups for compositional data

    Get PDF
    Social polices are designed using information collected in surveys; such as the Catalan TimeUse survey. Accurate comparisons of time use data among population groups are commonlyanalysed using statistical methods. The total daily time expended on different activities by asingle person is equal to 24 hours. Because this type of data are compositional, its sample spacehas particular properties that statistical methods should respect. The critical points required tointerpret differences between groups are provided and described in terms of log-ratio methods.These techniques facilitate the interpretation of the relative differences detected in multivariateand univariate analysis

    Characterizing major avalanche episodes in space and time in the twentieth and early twentyfirst centuries in the Catalan Pyrenees

    Get PDF
    With the aim of better understanding avalanche risk in the Catalan Pyrenees, the present work focuses on the analysis of major (or destructive) avalanches. For such purpose major avalanche cartography was made by an exhaustive photointerpretation of several flights, winter and summer field surveys and inquiries to local population. Major avalanche events were used to quantify the magnitude of the episodes during which they occurred, and a Major Avalanche Activity Magnitude Index (MAAMI) was developed. This index is based on the number of major avalanches registered and its estimated frequency in a given time period, hence it quantifies the magnitude of a major avalanche episode or winter. Furthermore, it permits a comparison of the magnitude between major avalanche episodes in a given mountain range, or between mountain ranges, and for a long enough period, it should allow analysis of temporal trends. Major episodes from winter 1995/96 to 2013/14 were reconstructed. Their magnitude, frequency and extent were also assessed. During the last 19 winters, the episodes of January 22-23 and February 6-8 in 1996 were those with highest MAAMI values,followed by January 30-31, 2003, January 29, 2006, and January 24-25, 2014. To analyze the whole twentieth century, a simplified MAAMI was defined in order to attain the same purpose with a less complete dataset. With less accuracy, the same parameters were obtained at winter time resolution throughout the twentieth century. Again, 1995/96 winter had the highest MAAMI value followed by 1971/72, 1974/75 and 1937/38 winter seasons. The analysis of the spatial extent of the different episodes allowed refining the demarcation of nivological regions, and improving our knowledge about the atmospheric patterns that cause major episodes and their climatic interpretation. In some cases, the importance of considering a major avalanche episode as the result of a previous preparatory period, followed by a triggering one was revealed

    Reconstructing the Snow Avalanche of Coll de Pal 2018 (SE Pyrenees).

    Get PDF
    Developments in mountain areas prone to natural hazards produce undesired impacts and damages. Thus, disaster assessment is mandatory to understand the physics of dangerous events and to make decisions to prevent hazardous situations. This work focusses on the practical implementation of methods and tools to assess a snow avalanche that affected a road at the Coll de Pal in 2018 (SE Pyrenees). This is a quite common situation in mountain roads and the assessment has to focus specially in the avalanche-road interaction, on the return periods considered and on the dynamics of the phenomena. This assessment presents the field recognition, snow and weather characterization and numerical modelling of the avalanche. Field campaigns revealed evidences of the avalanche triggering, runout trajectory and general behavior. An unstable situation of the snowpack due to a relatively large snowfall fallen some days before over a previous snowpack with weak layers, caused the avalanche triggering when an additional load was added by a strong wind-drift episode. A medium size (<2500 m3) soft slab avalanche, corresponding to a return period of 15-20 years, occurred and crossed the road of the Coll de Pal pass. The event was reproduced numerically by means of the 2D-SWE based numerical tool Iber aiming to analyze the avalanche behavior. Results of the simulation corresponded with the observations (runout trajectory and snow deposit); thus, relevant information about the avalanche dynamics could be obtained. Identified differences probably come from the terrain elevation data, which represent 'snow free' topography and do not consider the snowpack on the terrai

    Models de distribució sobre el símplex

    Get PDF
    Les dades composicionals són vectors les components dels quals representen proporcions respecte d'un total, i per tant estan sotmesos a la restricció que la suma de les seves components és una constant. L'espai natural per a vectors amb D components és el símplex SD. En l'àmbit de la modelització, ens trobem amb una gran dificultat: no coneixem prou classes de distribucions que permetin modelitzar adequadament la majoria dels conjunts de dades composicionals. En els anys 80, Aitchison proposa una metodologia per treballar amb dades composicionals que hem anomenat metodologia MOVE, ja que es basa en transformacions. En el tema específic de la modelització, Aitchison utilitza la transformació logquocient additiva per projectar les composicions a l'espai real i posteriorment les modelitza amb una distribució normal. D'aquesta manera introdueix la distribució normal logística additiva. Tot i les bones propietats algebraiques que presenta aquesta distribució ens trobem amb dues dificultats: el model normal no pot modelitzar alguns conjunts de dades transformades, especialment quan presenten una certa asimetria. Per altra banda, aquesta família de distribucions no és tancada respecte de l'amalgama (o suma) de components. El 1996 Azzalini i Dalla-Valle introdueixen la distribució normal asimètrica a RD. Es tracta d'una generalització del model normal amb un paràmetre de forma que regula la asimetria de la distribució. Utilitzant la teoria de les transformacions i la distribució normal asimètrica, hem definit una nova distribució que hem anomenat normal asimètrica logística additiva. Aquesta és especialment indicada per modelitzar conjunts de dades composicionals amb un biaix moderat, i consegüentment ens aporta la solució a una de les dificultats de la distribució normal logística additiva. Estudiant amb més detall aquest nou model, hem comprovat que presenta unes bones propietats algebraiques. Per altra banda i mitjançant simulacions, hem pogut il·lustrar l'efecte que tenen els paràmetres de la distribució normal logística additiva inicial en la distribució de l'amalgama i hem pogut comprovar que, en certs casos, el model normal asimètric proporciona un bon ajust per al logquocient de l'amalgama. Una eina útil en la modelització de vectors aleatoris són els tests de bondat d'ajust. Malauradament, no és gens freqüent trobar a la literatura tests de bondat d'ajust aplicables a la distribució normal asimètrica. Així doncs, hem desenvolupat uns tests per aquesta distribució i hem realitzat un estudi de potència utilitzant diverses distribucions alternatives. La metodologia que hem escollit és la de D'Agostino i Stephens que consisteix en mesurar la diferència entre la funció de distribució empírica (calculada mitjançant la mostra) i la funció de distribució teòrica (la normal asimètrica). L'estructura d'espai euclidià del símplex ens ha suggerit una nova metodologia que hem anomenat STAY ja que no es basa en les transformacions. Sabem que és equivalent utilitzar les operacions pròpies de SD que aplicar les operacions de l'espai real a les coordenades de les composicions respecte d'una base ortonormal. Sobre aquestes coordenades hem definit el model normal i el model normal asimètric a SD i hem realitzat un estudi comparatiu amb els models normal logístic additiu i normal asimètric logístic additiu. Si bé en determinades situacions aquesta nova metodologia dóna resultats totalment equivalents als obtinguts amb la tècnica de les transformacions, en altres aporta canvis importants. Per exemple, ha permès expressar directament sobre el símplex conceptes bàsics de l'estadística clàssica, com el concepte d'esperança o de variància. Donat que no existeixen treballs previs en aquesta direcció, proposem un exemple il·lustratiu en el cas univariant. Sobre les coordenades respecte d'una base unitària, hem definit el model normal a R+ i hem realitzat una comparació amb el model lognormal obtingut mitjançant la transformació logarítmica.Compositional data are vectors whose components represent proportions of some whole and this is the reason why they are subject to the unit-sum constraint of its components. Therefore, a suitable sample space for compositional data is the unit simplex SD. The modelling of compositional data has a great problem: the lack of enough flexible models. In the eighties Aitchison developed a methodology to work with compositional data that we have called MOVE methodology. It is based on the transformation of compositional data from SD to the real space and the transformed data is modelled by a multivariate normal distribution. The additive logratio transformation gives rice to the additive logistic normal model which exhibits rich properties. Unfortunately, sometimes a multivariate normal model cannot properly fit the transformed data set, especially when it presents some skewness. Also the additive logistic normal family is not closed under amalgamation of components. In 1996 Azzalini and Dalla Valle introduced the skew normal distribution: a family of distributions on the real space, including the multivariate normal distribution, but with an extra parameter which allows the density to have some skewness. Emulating Aitchison, we have combined the logistic normal approach with the skew-normal distribution to define a new class of distributions on the simplex: the additive logistic skew-normal class. We apply it to model compositional data sets when the transformed data presents some skewness. We have proved that this class of distributions has good algebraic properties. We have also studied the adequacy of the logistic skew-normal distribution to model amalgamations of additive logistic normal vectors. Simulation studies show that in some cases our distribution can provide a reasonable fit. A useful tool in the study of the modelisation of vectors is the test of goodness-of-fit. Unfortunately we don't find in the literature tests of goodness-of-fit for the skew-normal distribution. Thus, we have developed these kinds of tests and we have completed the work with a power study. We have chosen the R.B. D'Agostino and M.A. Stephens methodology that consists in computing the difference between the empirical distribution function (computed from the sample) and the theoretic distribution function (skew-normal). Parallel studies have recently developed the metric space structure of SD. This has suggested us a new methodology to work with compositional data sets that we have called STAY approach because it is not based on transformations. The theory of algebra tells us that any D dimensional real vector space with an inner product has an orthonormal basis to which the coefficients behave like usual elements in RD. Our suggestion is to apply to these coefficients all the standard methods and results available for real random vectors. Thus, on the coefficients with respect to an orthonormal basis we have defined the normal model in SD and the skew-normal model in SD and we have compared them with the additive logistic normal and the additive logistic skew-normal model respectively. From a probabilistic point of view, the laws on SD defined using the STAY methodology are identical to the laws defined using the MOVE methodology. But the STAY methodology has provided some important changes. For example, it has allowed us to express directly over the simplex some basic concepts like the expected value or the variance of a random composition. As we have not found in the literature previous work in this direction, we have started this study with an illustrative example. Over the coefficients with respect to a unitary basis we have defined the normal model in the positive real line and we have compared it with the lognormal model, defined with the logarithmic transformation

    Modelling count data using the logratio-normal-multinomial distribution

    Get PDF
    The logratio-normal-multinomial distribution is a count data model resulting from compounding a multinomial distribution for the counts with a multivariate logratio-normal distribution for the multinomial event probabilities. However, the logratio-normal-multinomial probability mass function does not admit a closed form expression and, consequently, numerical approximation is required for parameter estimation. In this work, different estimation approaches are introduced and evaluated. We concluded that estimation based on a quasi-Monte Carlo Expectation-Maximisation algorithm provides the best overall results. Building on this, the performances of the Dirichlet-multinomial and logratio-normal-multinomial models are compared through a number of examples using simulated and real count data
    corecore