87 research outputs found

    Copula-Based Multivariate Hydrologic Frequency Analysis

    Get PDF
    Multivariate frequency distributions are being increasingly recognized for their role in hydrological design and risk management. The conventional multivariate distributions are severely limited in that all constituent marginals have to be from the same distribution family. The copula method is a newly emerging approach for deriving multivariate distributions which overcomes this limitation. Use of copula method in hydrological applications has begun only recently and ascertaining the applicability of different copulas for combinations of various hydrological variables is currently an area of active research. Since there exists a variety of copulas capable of characterizing a broad range of dependence, the selection of appropriate copulas for different hydrological applications becomes a non-trivial task. This study evaluates the relative performance of various copulas and methods of parameter estimation as well as of recently developed statistical inference procedures. Potential copulas for multivariate extreme flow and rainfall processes are then identified. Multivariate hydrological frequency analysis typically utilizes only the concurrent parts of observed data, leaving a lot of non-concurrent information unutilized. Uncertainty in distribution parameter estimates can be reduced by simultaneously including such non-concurrent data in the analysis. A new copula-based “Composite Likelihood Approach” that allows all available multivariate data of varying lengths to be combined and analyzed in an integrated manner has been developed. This approach yields additional information, enhancing the precision of parameter estimates that are otherwise obtained from either purely univariate or purely multivariate considerations. The approach can be advantageously employed in limited hydrological data situations in order to provide significant virtual augmentation of available data lengths by virtue of increased precision of parameter estimates. The effectiveness of a copula selection framework that helps in an a priori short listing of potentially viable copulas on the basis of dependence characteristics has been examined using several case studies pertaining to various extreme flow and rainfall variables. The benefits of the composite likelihood approach in terms of significant improvement in the precision of parameter estimates of commonly used distributions in hydrology, such as normal, Gumbel, gamma, and log-Pearson Type III, have been quantified

    Estimation in generalized linear models and time series models with nonparametric correlation coefficients

    Get PDF

    Two Educational Comparisons of Linear and Circular Statistics

    Get PDF

    Approximations to non-central distributions and their applications

    Get PDF
    In testing hypotheses involving noncentral distributions percentage points are not always readily available and, if they are available, are not very well tabulated except, perhaps, for smaller degrees of freedom and noncentralities. Consequently,· for values that are not tabulated, interpolation, or, more likely, extrapolation of some kind is necessary, and the process can become tedious. In the case of calculations involving the power of the test, charts of the power of the F-test and t-test are available, but readings taken from these charts may be accurate to only one decimal place. In situations like the above, and in other cases, approximations are very useful and are sometimes as accurate, if not more so, than values obtained by interpolation (or extrapolation) or values read from charts. This thesis is chiefly concerned with applications in which approximations to the noncentral x2 , F, t and R distributions can be used. The approximations themselves, in most cases, are dealt with in a fair amount of detail to show the reader how they were obtained. Chapter 1 defines certain terms with which the reader may be unfamiliar, which are used in subsequent chapters. Chapters 2-5 deal with the approximations and their applications.· Each of these chapters is set out in the same way, section I am defining the noncentral distribution, section II dealing with the approximations, section III comparing the accuracy of the approximations with the exact values and section IV showing in which situations the approximations can be used

    Extending generalized linear models with random effects and components of dispersion = [Gegeneraliseerde lineaire modellen met extra stochastische termen en bijbehorende variantiecomponenten]

    Get PDF
    This dissertation was born out of a need for general and numerically feasible procedures for inference in variance components models for non-normal data. The methodology should be widely applicable within the institutes of the Agricultural Research Department (DLO) of the Dutch Ministry of Agriculture, Nature Management and Fisheries. Available methodology employing maximum likelihood estimation, due to numerical limitations, was too restricted with respect to the choice of random structures. Modification of the iterative re-weighted least squares (IRLS) algorithm, which is widely used for estimation in generalized linear models (GLMs), seemed a promising alternative to maximum likelihood.The class of generalized linear mixed models (GLMMs) studied in this dissertation, is a straightforward extension of GLMs. The proposed estimation procedure for GLMMs, obtained by replacing least squares by linear mixed model (LMM) methodology, is a straightforward extension of the IRLS procedure for GLMs. The new procedure, involves iterative use of restricted maximum likelihood (REML) and is referred to as iterative reweighted restricted maximum likelihood (IRREML). REML is an estimation procedure for ordinary normal data LMMs. Software for REML is widely available. In this thesis facilities for REML in the statistical programming language Genstat 5 are employed. In each iteration step of IRREML, REML is applied to an approximate LMM for an artificial dependent variate. This variate and corresponding residual weights, referred to as the "adjusted dependent variate" and the "iterative weights" (adhering to GLM terminology), are up- dated after each iteration. Numerical restrictions for IRREML are the same as for REML for ordinary normal data mixed models and pertain to the size of matrices to be inverted. These can be dealt with to a large extent by eliminating (absorbing) factors with a large number of levels. The estimation procedure, programmed in Genstat 5, is available through the Genstat Procedure Library of the Agricultural Mathematics Group (GLW-DLO). By now it has been widely used both within and outside the institutes of DLO.After the introduction in Chapter 1, inference for LMMs, with emphasis on REML, and for over- dispersed GLMs, illustrating maximum quasi-likelihood estimation, is discussed in Chapters 2 and 3.IRREML is introduced in Chapter 4. As can be seen from the discussion in that chapter, and from later chapters, a number of statisticians independently have approached the estimation problem from different starting points, ending up with the same estimating equations. A Bayesian approach for prediction of random (genetic) effects for binary, binomial and ordinal data, was presented as early as 1983 by Gionola and Foulley.In Chapter 5, a first attempt is made to assess the quality of IRREML by simulation. Simulated data was based on a practical problem involving carcass classification of cattle. For this problem, observations analysed were proportions of agreement between classifiers. Although the data set was large and highly unbalanced, a GLMM with four components of variance and an over-dispersion parameter could be fitted without problems. The simulation study included various procedures for the construction of confidence intervals and significance tests. These procedures, which were originally derived for LMMs under normality, were applied to the adjusted dependent variate in the last iteration step of IRREML. IRREML and the modified LMM procedures performed satisfactorily.In Chapter 6, the analysis of threshold models for binary and binomial data is considered. These threshold models are part of the class of GLMMs. A simulation study, mimicking an animal breeding experiment for binary data, indicated that IRREML may perform poorly when the number of observations per random effect is small. In terms of the animal breeding experiment: IRREML estimates of heritability may be considerably biased when the data set consists of a large number of small families. In contrast to other results in the literature, it was found that both under- and overestimation may occur, depending on therelative number of fixed effects in the model. In an animal breeding experiment, fixed effects usually represent a very large number of herds, years and seasons, which are all nuisance parameters, since interest centers on variance components and predicted random effects for animals (representing their genetic merit).In Chapter 7, IRREML is extended towards threshold models for ordinal data. Estimation includes additional shape parameters for a wide class of underlying distributions. For instance, heterogeneity of residual variances of an underlying normal distribution may be modelled in terms of factors and covariates employing a logarithmic link function.In Chapter 8, the simulation study for binary data from Chapter 6 is extended and two methods for bias correction of variance component estimators are studied. Minimal dimensions of the data set are identified, such that useful inference about components of variance is feasible.In Chapter 9, prediction of random effects in a model for normal data with heterogeneous variances is considered. In this model, both means and variances are expressed in terms of fixed and random effects, involving both additive and multiplicative effects. The estimation procedure was developed as a basis for a new national breeding evaluation method for Dutch dairy cattle. It was implemented by the Dutch Cattle Syndicate in Arnhem in 1995. Data sets in the dairy industry are extremely large, and therefore computational aspects were very important. A data set comprising 12,629,403 milk records was analysed. Ideas behind IRREML were used to motivate the estimation procedure. The performance of the procedure was assessed by simulation.In Chapter 10 the relationship between estimation by IRREML and maximum likelihood (ML) estimation, is discussed in some detail. Employing Laplace integration, IRREML may be shown to be an approximate ML procedure. The poor asymptotic properties of IRREML when the number of binary observations per random effect is limited and the number of random effects is large, are illustrated by a simple over-dispersion model for binomial data. Since ML was seen to perform well, the Gibbs sampler, as a powerful numerical integrator to derive approximate ML estimates, seems a promising technique for datasets of this kind

    Goodness-of-fit statistics for location-scale distributions

    Get PDF
    This dissertation is concerned with the problem of assessing the fit of a hypothesized parametric family of distributions to data. A nontraditional use of the chi-square and likelihood ratio statistics is considered in which the number of cells is allowed to increase as the sample size increases. A new goodness-of-fit statistic k(\u272), based on the Pearson correlation coefficient of points of a P-P (percent versus percent) probability plot, is developed for testing departures from the normal, Gumbel, and exponential distributions. A statistic r(\u272) based on the Pearson correlation coefficient of points on a Q-Q (quantile versus quantile) probability plot is also considered. A new qualitative method based on the P-P probability plot is developed, for assessing the goodness of fit of nonhypothesized probability models to data. This method is not limited to location-scale distributions. Curves were fitted through the Monte Carlo percentiles to obtain formulas for the percentiles of k(\u272) and r(\u272) Statistics and Probability; An extensive Monte Carlo power comparison was performed for the normal, Gumbel, and exponential distributions. The statistics examined included those mentioned earlier, statistics based on the moments, statistics based on the empirical distribution function, and the commonly used Shapiro-Wilk statistic. The results of the power study are summarized, and general recommendations are given for the use of these Statistics and Probability

    Korelační analýza biomedicínských dat

    Get PDF
    In the correlation analysis of biomedical data, a common issue is the non-fulfilment of the assumption of normality. The main goal of the thesis is to provide a comprehensive theoretical background on correlation analysis, with a focus on quantitative variables that do not follow a normal distribution, and to apply the theoretical knowledge in the analysis of a real biomedical dataset. R-project was used for implementation.Při korelační analýze biomedicínských dat je častým problémem nesplnění předpokladu normality. Cílem práce je poskytnout teoretický základ korelační analýzy se zaměřením na kvantitativní proměnné, které nepodléhají normálnímu rozdělení a aplikovat teoretické znalosti při analýze reálného biomedicínského datového souboru. Při implementaci byl použit R-project.470 - Katedra aplikované matematikyvýborn

    New copula models in quantitative finance

    No full text
    Imperial Users onl

    A Monte Carlo Comparison of Tests for Multivariate Normality Based on Multivariate Skewness and Kurtosis.

    Get PDF
    The assumption of multivariate normality (MVN) underlies many common parametric multivariate statistical procedures, and numerous tests have been defined for testing the assumption. Among these tests, those based on concepts of multivariate skewness and multivariate kurtosis hold special interest since they appear to test for specific types of departures from MVN. This research uses Monte Carlo simulation to compare the performance of several MVN tests which are based on various definitions of multivariate skewness and kurtosis. Specifically, the tests are Mardia\u27s (1970) b\sb{\rm 1,p} and b\sb{\rm 2,p}, Small\u27s (1980) Q\sb1 and Q\sb2, and Srivastava\u27s (1984) b\sb{\rm 1p} and b\sb{\rm 2p}. Two main issues are addressed. First, Mardia\u27s tests are affine invariant, while those of Small and Srivastava are coordinate dependent. Conjectures are advanced regarding the conditions under which coordinate-dependent tests will perform better than affine-invariant tests and vice versa. A Monte Carlo experiment is constructed to evaluate these conjectures. It is concluded that neither coordinate-dependent nor affine-invariant tests can be eliminated from consideration, since each type is strongly superior to the other under certain circumstances. These circumstances pertain to whether or not those third- and fourth-order moments involving more than one variable in the coordinate system have normal or non-normal values. The second issue concerns the distributional dependency of skewness tests. It is conjectured, in particular, that skewness tests based on third-order moments (which includes all skewness tests considered here) are highly distributionally dependent, with this dependency being related to the same distributional characteristic that determines kurtosis. It is further conjectured that this dependency remains of importance asymptotically A Monte Carlo experiment is designed to evaluate these conjectures. Results confirm the dependency and that it is not simply a small sample problem. Based on this, it is concluded that skewness tests are not truly diagnostic; that is, they do not distinguish well between skewed and non-skewed distributions. In particular, skewness tests are likely to identify as skewed many non-skewed distributions with greater than MVN kurtosis; and they will fail to identify as skewed many skewed distributions with less than MVN kurtosis

    Robust correlation coefficient based on robust scale and location estimator

    Get PDF
    The correlation coefficient is the common statistical analysis that has been used in measuring the relationship between two variables. The most frequently used correlation coefficients is the Pearson correlation coefficient. This coefficient is powerful when the assumptions of linearity between two variables and the normality of the distribution are fulfilled. However, this correlation coefficient unable to perform well with the presence of the outlier in the data. The calculation of the Pearson correlation coefficient uses mean, which known to be very sensitive to the outlier. Alternatively, the Spearman rank correlation coefficient and Kendall’s Tau correlation coefficient are the solutions for this problem. The usage of rank in the calculation of these coefficients instead of original observation lead to losing useful information. For that reason, this study focusing on robust correlation approach based on the median. The existence of median based correlation coefficient used Median Absolute Deviation (MAD) as it scales estimator. Nevertheless, the MAD has low efficiency under Gaussian distribution and this estimator only view dispersion on symmetric distribution. Thus, this study modified the median based correlation using two approaches. Firstly, using the same median based correlation, this study proposed another robust scale estimator namely MADn, Sn, and Qn. Secondly, this study changed the median based correlation to the Hodges Lehmann based correlation and employed all robust scale estimators that are median, MAD, MADn, Sn, and Qn. The performances of the proposed procedures were evaluated based on two conditions of simulation data; perfect and contaminated data. Three indicators were used in evaluating the performance of the proposed procedures which are the correlation coefficient value, the average bias and the standard error. The proposed procedures were validated using a real dataset. The results of the simulation data show that the Qn correlation coefficient and Hodges Lehmann- Qn correlation coefficient performed better under contaminated data compared to the Pearson correlation coefficient and other existing robust correlation coefficients. As the conclusion, the Qn correlation coefficient and the Hodges Lehmann- Qn correlation coefficient are the good alternatives for the Pearson correlation coefficient when there is the outlier in the data
    corecore