321 research outputs found

    Approaches to Sample Size Determination for Multivariate Data:Applications to PCA and PLS-DA of Omics Data

    Get PDF
    Sample size determination is a fundamental step in the design of experiments. Methods for sample size determination are abundant for univariate analysis methods, but scarce in the multivariate case. Omics data are multivariate in nature and are commonly investigated using multivariate statistical methods, such as principal component analysis (PCA) and partial least-squares discriminant analysis (PLS-DA). No simple approaches to sample size determination exist for PCA and PLS-DA. In this paper we will introduce important concepts and offer strategies for (minimally) required sample size estimation when planning experiments to be analyzed using PCA and/or PLS-DA.</p

    Bias-variance trade-off in continuous test norming

    Get PDF
    In continuous test norming, the test score distribution is estimated as a continuous function of predictor(s). A flexible approach for norm estimation is the use of generalized additive models for location, scale, and shape. It is unknown how sensitive their estimates are to model flexibility and sample size. Generally, a flexible model that fits at the population level has smaller bias than its restricted nonfitting version, yet it has larger sampling variability. We investigated how model flexibility relates to bias, variance, and total variability in estimates of normalizedzscores under empirically relevant conditions, involving the skew Studenttand normal distributions as population distributions. We considered both transversal and longitudinal assumption violations. We found that models with too strict distributional assumptions yield biased estimates, whereas too flexible models yield increased variance. The skew Studenttdistribution, unlike the Box-Cox Power Exponential distribution, appeared problematic to estimate for normally distributed data. Recommendations for empirical norming practice are provided

    Model selection in continuous test norming with GAMLSS

    Get PDF
    To compute norms from reference group test scores, continuous norming is preferred over traditional norming. A suitable continuous norming approach for continuous data is the use of the Box–Cox Power Exponential model, which is found in the generalized additive models for location, scale, and shape. Applying the Box–Cox Power Exponential model for test norming requires model selection, but it is unknown how well this can be done with an automatic selection procedure. In a simulation study, we compared the performance of two stepwise model selection procedures combined with four model-fit criteria (Akaike information criterion, Bayesian information criterion, generalized Akaike information criterion (3), cross-validation), varying data complexity, sampling design, and sample size in a fully crossed design. The new procedure combined with one of the generalized Akaike information criterion was the most efficient model selection procedure (i.e., required the smallest sample size). The advocated model selection procedure is illustrated with norming data of an intelligence test

    Comparison of Estimation Procedures for Multilevel AR(1) Models

    Get PDF
    To estimate a time series model for multiple individuals, a multilevel model may be used.In this paper we compare two estimation methods for the autocorrelation in Multilevel AR(1) models, namely Maximum Likelihood Estimation (MLE) and Bayesian Markov Chain Monte Carlo.Furthermore, we examine the difference between modeling fixed and random individual parameters.To this end, we perform a simulation study with a fully crossed design, in which we vary the length of the time series (10 or 25), the number of individuals per sample (10 or 25), the mean of the autocorrelation (-0.6 to 0.6 inclusive, in steps of 0.3) and the standard deviation of the autocorrelation (0.25 or 0.40).We found that the random estimators of the population autocorrelation show less bias and higher power, compared to the fixed estimators. As expected, the random estimators profit strongly from a higher number of individuals, while this effect is small for the fixed estimators.The fixed estimators profit slightly more from a higher number of time points than the random estimators.When possible, random estimation is preferred to fixed estimation.The difference between MLE and Bayesian estimation is nearly negligible. The Bayesian estimation shows a smaller bias, but MLE shows a smaller variability (i.e., standard deviation of the parameter estimates).Finally, better results are found for a higher number of individuals and time points, and for a lower individual variability of the autocorrelation. The effect of the size of the autocorrelation differs between outcome measures

    A tutorial on regression-based norming of psychological tests with GAMLSS

    Get PDF
    A norm-referenced score expresses the position of an individual test taker in the reference population, thereby enabling a proper interpretation of the test score. Such normed scores are derived from test scores obtained from a sample of the reference population. Typically, multiple reference populations exist for a test, namely when the norm-referenced scores depend on individual characteristic(s), as age (and sex). To derive normed scores, regression-based norming has gained large popularity. The advantages of this method over traditional norming are its flexible nature, yielding potentially more realistic norms, and its efficiency, requiring potentially smaller sample sizes to achieve the same precision. In this tutorial, we introduce the reader to regression-based norming, using the generalized additive models for location, scale, and shape (GAMLSS). This approach has been useful in norm estimation of various psychological tests. We discuss the rationale of regression-based norming, theoretical properties of GAMLSS and their relationships to other regression-based norming models. Based on 6 steps, we describe how to: (a) design a normative study to gather proper normative sample data; (b) select a proper GAMLSS model for an empirical scale; (c) derive the desired normed scores for the scale from the fitted model, including those for a composite scale; and (d) visualize the results to achieve insight into the properties of the scale. Following these steps yields regression-based norms with GAMLSS for a psychological test, as we illustrate with normative data of the intelligence test IDS-2. The complete R code and data set is provided as online supplemental material
    • …
    corecore