348 research outputs found

    Sphericity estimation bias for repeated measures designs in simulation studies

    Get PDF
    In this study, we explored the accuracy of sphericity estimation and analyzed how the sphericity of covariance matrices may be affected when the latter are derived from simulated data. We analyzed the consequences that normal and nonnormal data generated from an unstructured population covariance matrix with low (ε = .57) and high (ε = .75) sphericity can have on the sphericity of the matrix that is fitted to these data. To this end, data were generated for four types of distributions (normal, slightly skewed, moderately skewed, and severely skewed or log-normal), four sample sizes (very small, small, medium, and large), and four values of the within-subjects factor (K = 4, 6, 8, and 10). Normal data were generated using the Cholesky decomposition of the correlation matrix, whereas the Vale-Maurelli method was used to generate nonnormal data. The results indicate the extent to which sphericity is altered by recalculating the covariance matrix on the basis of simulated data. We concluded that bias is greater with spherical covariance matrices, nonnormal distributions, and small sample sizes, and that it increases in line with the value of K. An interaction was also observed between sample size and K: With very small samples, the observed bias was greater as the value of K increased

    Transformations for multivariate statistics

    Get PDF
    This paper derives transformations for multivariate statistics that eliminate asymptotic skewness, extending the results of Niki and Konishi (1986, Annals of the Institute of Statistical Mathematics 38, 371-383). Within the context of valid Edgeworth expansions for such statistics we first derive the set of equations that such a transformation must satisfy and second propose a local solution that is sufficient up to the desired order. Application of these results yields two useful corollaries. First, it is possible to eliminate the first correction term in an Edgeworth expansion, thereby accelerating convergence to the leading term normal approximation. Second, bootstrapping the transformed statistic can yield the same rate of convergence of the double, or prepivoted, bootstrap of Beran (1988, Journal of the American Statistical Association 83, 687-697), applied to the original statistic, implying a significant computational saving. The analytic results are illustrated by application to the family of exponential models, in which the transformation is seen to depend only upon the properties of the likelihood. The numerical properties are examined within a class of nonlinear regression models (logit, probit, Poisson, and exponential regressions), where the adequacy of the limiting normal and of the bootstrap (utilizing the k-step procedure of Andrews, 2002, Econometrica 70, 119-162) as distributional approximations is assessed

    cudaBayesreg: Parallel Implementation of a Bayesian Multilevel Model for fMRI Data Analysis

    Get PDF
    Graphic processing units (GPUs) are rapidly gaining maturity as powerful general parallel computing devices. A key feature in the development of modern GPUs has been the advancement of the programming model and programming tools. Compute Unified Device Architecture (CUDA) is a software platform for massively parallel high-performance computing on Nvidia many-core GPUs. In functional magnetic resonance imaging (fMRI), the volume of the data to be processed, and the type of statistical analysis to perform call for high-performance computing strategies. In this work, we present the main features of the R-CUDA package cudaBayesreg which implements in CUDA the core of a Bayesian multilevel model for the analysis of brain fMRI data. The statistical model implements a Gibbs sampler for multilevel/hierarchical linear models with a normal prior. The main contribution for the increased performance comes from the use of separate threads for fitting the linear regression model at each voxel in parallel. The R-CUDA implementation of the Bayesian model proposed here has been able to reduce significantly the run-time processing of Markov chain Monte Carlo (MCMC) simulations used in Bayesian fMRI data analyses. Presently, cudaBayesreg is only configured for Linux systems with Nvidia CUDA support

    Modeling and simulation of value -at -risk in the financial market area

    Get PDF
    Value-at-Risk (VaR) is a statistical approach to measure market risk. It is widely used by banks, securities firms, commodity and energy merchants, and other trading organizations. The main focus of this research is measuring and analyzing market risk by modeling and simulation of Value-at-Risk for portfolios in the financial market area. The objectives are (1) predicting possible future loss for a financial portfolio from VaR measurement, and (2) identifying how the distributions of the risk factors affect the distribution of the portfolio. Results from (1) and (2) provide valuable information for portfolio optimization and risk management. The model systems chosen for this study are multi-factor models that relate risk factors to the portfolio\u27s value. Regression analysis techniques are applied to derive linear and quadratic multifactor models for the assets in the portfolio. Time series models, such as ARIMA and state-space, are used to forecast the risk factors of the portfolio. The Monte Carlo simulation process is developed to comprehensively simulate the risk factors according to the four major distributions used to describe data in the financial market. These distributions are: multivariate normal, multivariate t, multivariate skew-normal, and multivariate skew t. The distribution of the portfolio is characterized by combining the multifactor models with the Monte Carlo simulation process. Based on the characterization of the portfolio distribution, any VaR measure of the portfolio can be calculated. The results of the modeling and simulation show that (1) a portfolio may not have the same kind of distribution as the risk factors if the relationship between the portfolio and the risk factors is expressed as a quadratic function; (2) the normal distribution underestimates risk if the real data have a heavy tail and a high peak; and (3) diversification is the best strategy of investment since it reduces the VaR by combining assets together. The computational approach developed in this dissertation can be used for any VaR measurement in any area as long as the relationship between an asset and risk factors can be modeled and the joint distribution of risk factors can be characterized

    Simulating High-Dimensional Multivariate Data using the bigsimr R Package

    Full text link
    It is critical to accurately simulate data when employing Monte Carlo techniques and evaluating statistical methodology. Measurements are often correlated and high dimensional in this era of big data, such as data obtained in high-throughput biomedical experiments. Due to the computational complexity and a lack of user-friendly software available to simulate these massive multivariate constructions, researchers resort to simulation designs that posit independence or perform arbitrary data transformations. To close this gap, we developed the Bigsimr Julia package with R and Python interfaces. This paper focuses on the R interface. These packages empower high-dimensional random vector simulation with arbitrary marginal distributions and dependency via a Pearson, Spearman, or Kendall correlation matrix. bigsimr contains high-performance features, including multi-core and graphical-processing-unit-accelerated algorithms to estimate correlation and compute the nearest correlation matrix. Monte Carlo studies quantify the accuracy and scalability of our approach, up to d=10,000d=10,000. We describe example workflows and apply to a high-dimensional data set -- RNA-sequencing data obtained from breast cancer tumor samples.Comment: 22 pages, 10 figures, https://cran.r-project.org/web/packages/bigsimr/index.htm

    A Statistical Evaluation of Algorithms for Independently Seeding Pseudo-Random Number Generators of Type Multiplicative Congruential (Lehmer-Class).

    Get PDF
    To be effective, a linear congruential random number generator (LCG) should produce values that are (a) uniformly distributed on the unit interval (0,1) excluding endpoints and (b) substantially free of serial correlation. It has been found that many statistical methods produce inflated Type I error rates for correlated observations. Theoretically, independently seeding an LCG under the following conditions attenuates serial correlation: (a) simple random sampling of seeds, (b) non-replicate streams, (c) non-overlapping streams, and (d) non-adjoining streams. Accordingly, 4 algorithms (each satisfying at least 1 condition) were developed: (a) zero-leap, (b) fixed-leap, (c) scaled random-leap, and (d) unscaled random-leap. Note that the latter satisfied all 4 independent seeding conditions. To assess serial correlation, univariate and multivariate simulations were conducted at 3 equally spaced intervals for each algorithm (N=24) and measured using 3 randomness tests: (a) the serial correlation test, (b) the runs up test, and (c) the white noise test. A one-way balanced multivariate analysis of variance (MANOVA) was used to test 4 hypotheses: (a) omnibus, (b) contrast of unscaled vs. others, (c) contrast of scaled vs. others, and (d) contrast of fixed vs. others. The MANOVA assumptions of independence, normality, and homogeneity were satisfied. In sum, the seeding algorithms did not differ significantly from each other (omnibus hypothesis). For the contrast hypotheses, only the fixed-leap algorithm differed significantly from all other algorithms. Surprisingly, the scaled random-leap offered the least difference among the algorithms (theoretically this algorithm should have produced the second largest difference). Although not fully supported by the research design used in this study, it is thought that the unscaled random-leap algorithm is the best choice for independently seeding the multiplicative congruential random number generator. Accordingly, suggestions for further research are proposed

    Advancing the iid Test Based on Integration across the Correlation Integral: Ranges, Competition, and Power

    Get PDF
    This paper builds on Kočenda (2001) and extends it in two ways. First, two new intervals of the proximity parameter ε (over which the correlation integral is calculated) are specified. For these ε- ranges new critical values for various lengths of the data sets are introduced and through Monte Carlo studies it is shown that within new ε-ranges the test is even more powerful than within the original ε-range. A sensitivity analysis of the critical values with respect to ε-range choice is also given. Second, a comparison with existing results of the controlled competition of Barnett et al. (1997) as well as broad power tests on various nonlinear and chaotic data are provided. The results of the comparison strongly favor our robust procedure and confirm the ability of the test in finding nonlinear dependencies. An empirical comparison of the new ε-ranges with the original one shows that the test within the new ε-ranges is able to detect hidden patterns with much higher precision. Finally, new user-friendly and fast software is introduced.chaos, nonlinear dynamics, correlation integral, Monte Carlo, single-blind competition, power tests, high-frequency economic and financial data
    corecore