346 research outputs found

    Modeling general, specific, and method variance in personality measures: Results for ZKA-PQ and NEO-PI-R

    Full text link
    Reprinted by permission of SAGE PublicationsContemporary models of personality assume a hierarchical structure in which broader traits contain narrower traits. Individual differences in response styles also constitute a source of score variance. In this study, the bifactor model is applied to separate these sources of variance for personality subscores. The procedure is illustrated using data for two personality inventories—NEO Personality Inventory–Revised and Zuckerman–Kuhlman–Aluja Personality Questionnaire. The inclusion of the acquiescence method factor generally improved the fit to acceptable levels for the Zuckerman–Kuhlman–Aluja Personality Questionnaire, but not for the NEO Personality Inventory–Revised. This effect was higher in subscales where the number of direct and reverse items is not balanced. Loadings on the specific factors were usually smaller than the loadings on the general factor. In some cases, part of the variance was due to domains being different from the main one. This information is of particular interest to researchers as they can identify which subscale scores have more potential to increase predictive validit

    The use of IRT to investigate the relationships between the g factor and cognitive tasks

    Full text link
    Las relaciones entre el factor g y tareas de procesamiento de la información han abierto la puerta a la comprensión de la naturaleza de g ( Jensen, 1998). No obstante, es necesario superar determinados problemas metodológicos para poder comprender hasta qué punto las diferencias en g son ocasionadas por diferencias en procesos cognitivos. En este sentido, la falta de fiabilidad de las puntuaciones en las tareas de procesos puede llevar a conclusiones equivocadas sobre la naturaleza de dichas relaciones. Un medio de abordar este problema es seleccionar a las personas con un menor error de estimación del nivel de habilidad. Los resultados muestran un aumento de las correlaciones entre g y memoria de trabajo en el grupo de sujetos mejor medidos frente al grupo de personas con un error de estimación mayorMany authors have looked for the relations between the g factor and cognitive tasks in order to find out the nurture of the g factor (Jensen, 1998). Nevertheless, some methodological problems must be solved to understand if the individual differences in g are differences in cognitive processes. One of them is the low reliability of the scores in this kind of tasks. On the grounds of the standard error of the ability level estimation, we hypothesized that the correlations between the g factor and a working memory task would be larger in the g roup of the subjects better measured (less standard error). The results support this hypothesi

    Are fit indices really fit to estimate the number of factors with categorical variables?: some cautionary findings via Monte Carlo Simulation

    Full text link
    This paper is not the copy of record and may not exactly replicate the authoritative document published in the APA journal. Please do not copy or cite without author's permission. The final article is available, upon publication, at: http://dx.doi.org/10.1037/met0000064An early step in the process of construct validation consists of establishing the fit of an unrestricted "exploratory" factorial model for a prespecified number of common factors. For this initial unrestricted model, researchers have often recommended and used fit indices to estimate the number of factors to retain. Despite the logical appeal of this approach, little is known about the actual accuracy of fit indices in the estimation of data dimensionality. The present study aimed to reduce this gap by systematically evaluating the performance of 4 commonly used fit indices-the comparative fit index (CFI), the Tucker-Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR)-in the estimation of the number of factors with categorical variables, and comparing it with what is arguably the current golden rule, Horn's (1965) parallel analysis. The results indicate that the CFI and TLI provide nearly identical estimations and are the most accurate fit indices, followed at a step below by the RMSEA, and then by the SRMR, which gives notably poor dimensionality estimates. Difficulties in establishing optimal cutoff values for the fit indices and the general superiority of parallel analysis, however, suggest that applied researchers are better served by complementing their theoretical considerations regarding dimensionality with the estimates provided by the latter metho

    Calibración con parámetros de los ítems fijos para la evaluación del funcionamiento diferencial del ítem en tests adaptativos informatizados

    Full text link
    In computerized adaptive testing pretest items are presented in conjunction with operational items to renew the item bank. Pretest items are calibrated, and possible differential item functioning (DIF) is analyzed. Some difficulties arise due to the large amount of missing responses, which can be avoided by the use of fixed item parameter calibration (FIPC; Kim, 2006) methods. In this study, we applied the multiple weights updating and multiple EM cycles method, with response imputation (as suggested by Lei, Chen, & Yu, 2006) and without response imputation for non-applied items. The IRT likelihood ratio test (IRT-LRT) was used for DIF detection. The manipulated factors were type of DIF, DIF size, impact size, test length, and sample size. The results showed that the FIPC method is suitable for detecting large-size DIF in large samples. In the presence of impact the use of imputation led to a bias in the effect-size measure of the DIFEn tests adaptativos informatizados los ítems pretest se presentan junto con los ítems operativos para renovar el banco de ítems. Los ítems pretest se calibran y se analiza el posible funcionamiento diferencial de los ítems (FDI). Este análisis presenta algunos problemas debido a la gran cantidad de respuestas faltantes, una de las posibles soluciones es el uso de métodos de calibración con parámetros fijos (Kim, 2006). En este estudio, aplicamos el método de múltiples actualizaciones de los pesos y múltiples ciclos EM con imputación de respuestas (tal y como propusieron Lei, Chen, y Yu, 2006) y sin imputación de respuesta para los ítems no aplicados. Empleamos el test de razón de verosimilitudes de la TRI para la detección del FDI. Los factores manipulados fueron el tipo de FDI, el tamaño del FDI, el tamaño del impacto, la longitud del test, y el tamaño de las muestras. Los resultados señalan que el método de calibración con parámetros fijos es una alternativa adecuada para la detección de un FDI grande cuando se utilizaron muestras grandes. En presencia de impacto el uso de imputación de respuestas introdujo un sesgo en las medidas del tamaño del efecto del FDIThis research was partly supported by a grant from the Spanish Ministry of Education and Science [PSI2009-10341

    Métodos de detección del falseamiento en test online

    Full text link
    Background: Unproctored Internet Tests (UIT) are vulnerable to cheating attempts by candidates to obtain higher scores. To prevent this, subsequent procedures such as a verification test (VT) is carried out. This study compares five statistics used to detect cheating in Computerized Adaptive Tests (CATs): Guo and Drasgow’s Z-test, the Adaptive Measure of Change (AMC), Likelihood Ratio Test (LRT), Score Test, and Modified Signed Likelihood Ratio Test (MSLRT). Method: We simulated data from honest and cheating candidates to the UIT and the VT. Honest candidates responded to the UIT and the VT with their real ability level, while cheating candidates responded only to the VT, and different levels of cheating were simulated. We applied hypothesis tests, and obtained type I error and power rates. Results: Although we found differences in type I error rates between some of the procedures, all procedures reported quite accurate results with the exception of the Score Test. The power rates obtained point to MSLRT’s superiority in detecting cheating. Conclusions: We consider the MSLRT to be the best test, as it has the highest power rate and a suitable type I error rate.Antecedentes: las pruebas de selección en línea sin vigilancia (UIT) son vulnerables a intentos de falseamiento para obtener puntuaciones superiores. Por ello, en ocasiones se utilizan procedimientos de detección, como aplicar posteriormente un test de verifi cación (VT). El objetivo del estudio es comparar cinco contrastes estadísticos para la detección del falseamiento en Test Adaptativos Informatizados: Z-test de Guo y Drasgow, Medida de Cambio Adaptativa (AMC), Test de Razón de Verosimilitudes (LRT), Score Test y Modifi ed Signed Likelihood Ratio Test (MSLRT). Método: se simularon respuestas de participantes honestos y falseadores al UIT y al VT. Para los participantes honestos se simulaban en ambos en función de su nivel de rasgo real; para los falseadores, solo en el VT, y en el UIT se simulaban distintos grados de falseamiento. Después, se obtenían las tasas de error tipo I y potencia. Resultados: Se encontraron diferencias en las tasas de error tipo I entre algunos procedimientos, pero todos menos el Score Test se ajustaron al valor nominal. La potencia obtenida era signifi cativamente superior con el MSLRT. Conclusiones: consideramos que MSLRT es la mejor alternativa, ya que tiene mejor potencia y una tasa de error tipo I ajustada.This research was partially supported by Ministerio de Ciencia, Innovación y Universidades, Spain (Grant PSI2017-85022-P), European Social Fund, and Cátedra de Modelos y Aplicaciones Psicométricos (Instituto de Ingeniería del Conocimiento and Autonomous University of Madrid)

    A new IRT-based standard setting method: application to eCat-Listening

    Full text link
    Criterion-referenced interpretations of tests are highly necessary, which usually involves the diffi cult task of establishing cut scores. Contrasting with other Item Response Theory (IRT)-based standard setting methods, a non-judgmental approach is proposed in this study, in which Item Characteristic Curve (ICC) transformations lead to the final cut scores. Method: eCat-Listening, a computerized adaptive test for the evaluation of English Listening, was administered to 1,576 participants, and the proposed standard setting method was applied to classify them into the performance standards of the Common European Framework of Reference for Languages (CEFR). Results: The results showed a classifi cation closely related to relevant external measures of the English language domain, according to the CEFR. Conclusions: It is concluded that the proposed method is a practical and valid standard setting alternative for IRT-based tests interpretationsUn nuevo método de standard setting basado en la TRI: aplicación a eCat-Listening. Antecedentes: las interpretaciones de los tests referidas a criterio son muy necesarias, lo cual normalmente implica la difícil tarea de establecer puntos de corte. En contraste con otros métodos de standard setting basados en la Teoría de la Respuesta al Ítem (TRI), en este estudio se propone una aproximación no basada en juicios, en que transformaciones de las Curvas Características de los Ítems (CCIs) dan lugar a los puntos de corte finales. Método: se administró eCat-Listening, un test adaptativo informatizado de evaluación de la comprensión oral del inglés, a 1.576 participantes y se aplicó el método de standard setting propuesto para clasificarles en los estándares de ejecución del Marco Común Europeo de Referencia para las lenguas (MCER). Resultados: los resultados mostraron una clasificación estrechamente relacionada con variables externas relevantes sobre dominio del inglés, de acuerdo con el MCER. Conclusiones: se concluye que el método de standard setting propuesto es una alternativa práctica y válida para las interpretaciones de tests basados en TRIThis research was partly supported by two grants from the Spanish Ministerio de Educación y Ciencia (projects PSI2008-01685 and PSI2009-10341

    Optimal number of strata for the stratified methods in computerized adaptive testing

    Full text link
    Test security can be a major problem in computerized adaptive testing, as examinees can share information about the items they receive. Of the different item selection rules proposed to alleviate this risk, stratified methods are among those that have received most attention. In these methods, only low discriminative items can be presented at the beginning of the test and the mean information of the items increases as the test goes on. To do so, the item bank must be divided into several strata according to the information of the items. To date, there is no clear guidance about the optimal number of strata into which the item bank should be split. In this study, we will simulate conditions with different numbers of strata, from 1 (no stratification) to a number of strata equal to test length (maximum level of stratification) while manipulating the maximum exposure rate that no item should surpass (rmax) in its whole domain. In this way, we can plot the relation between test security and accuracy, making it possible to determine the number of strata that leads to better security while holding constant measurement accuracy. Our data indicates that the best option is to stratify into as many strata as possible.This research was supported by a grant from the Spanish Ministerio de Ciencia e Innovación (project number PSI2009–10341), and by a grant from the Fundación Universitaria Antonio Gargallo and the Obra Social de Ibercaja

    Generational intelligence tests score changes in Spain: are we asking the right question?

    Full text link
    Generational intelligence test score gains have been documented worldwide in the twentieth century. However, recent evidence suggests these increased scores are coming to an end in some world regions. Here we compare two cohorts of university freshmen. The first cohort (n = 311) was assessed in 1991, whereas the second cohort (n = 349) was assessed thirty years later (2022). These cohorts completed the same intelligence battery including eight standardized speeded and power tests tapping reasoning (abstract and quantitative), language (vocabulary, verbal comprehension, and verbal meanings), rote calculation, and visuospatial relations. The results revealed a global gain of 3.5 IQ points but also upward and downward changes at the test level. The 2022 cohort outperformed the 1991 cohort on reasoning (abstract and quantitative), verbal comprehension, and vocabulary, whereas the 1991 cohort outscored the 2022 cohort on rote calculation, visuospatial relations (mental rotation and identical figures), and verbal meanings. These findings are thought to support one key claim made by James Flynn: generational changes on the specific cognitive abilities and skills tapped by standardized tests should be expected without appreciable or substantive changes in the structure of the intelligence construct identified within generations. This main conclusion is discussed with respect to theoretical causal implications putatively derived from current intelligence psychometric model

    Exploratory factor analysis in validation studies: uses and recommendations

    Full text link
    The Exploratory Factor Analysis (EFA) procedure is one of the most commonly used in social and behavioral sciences. However, it'is also one of the most criticized due to the poor management researchers usually display. The main goal is to examine the relationship between practices usually considered more appropriate and actual decisions made by researchers. Method: The use of exploratory factor analysis is examined in 117 papers published between 2011 and 2012 in 3 Spanish psychological journals with the highest impact within the previous five years. Results: Results show significant rates of questionable decisions in conducting EFA, based on unjustified or mistaken decisions regarding the method of extraction, retention, and rotation of factors. Conclusions: Overall, the current review provides support for some improvement guidelines regarding how to apply and report an EFAEl análisis factorial exploratorio en estudios de validación: usos y recomendaciones. Antecedentes: la técnica del Análisis Factorial Exploratorio (AFE) es una de las más utilizadas en el ámbito de las Ciencias Sociales y del Comportamiento; no obstante, también es una de las técnicas más criticadas por la escasa solvencia con que se emplea en investigación aplicada. El objetivo principal de este artículo es describir y valorar el grado de correspondencia entre la aplicación del AFE en las publicaciones revisadas y las prácticas que habitualmente se consideran más adecuadas. Método: se analizan 117 estudios en los que se aplica la técnica del AFE, publicados en 2011 y 2012, en las tres revistas españolas de Psicología con mayor índice de impacto medio en los últimos cinco años. Resultados: se obtienen importantes tasas de decisiones injustificadas o erróneas respecto al método de extracción, retención y rotación de factores. Conclusiones: en conjunto, la presente revisión proporciona una guía sobre posibles mejoras al ejecutar e informar de un AF

    Bi-factor exploratory structural equation modeling done right: using the SLiDapp application

    Full text link
    Background: Due to its fl exibility and statistical properties, bi-factor Exploratory Structural Equation Modeling (bi-factor ESEM) has become an often-recommended tool in psychometrics. Unfortunately, most recent methods for approximating these structures, such as the SLiD algorithm, are not available in the leading software for performing ESEM (i.e., Mplus). To resolve this issue, we present a novel, user-friendly Shiny application for integrating the SLiD algorithm in bi-factor ESEM estimation in Mplus. Thus, a two-stage framework for conducting SLiDbased bi-factor ESEM in Mplus was developed. Method: This approach was presented in a step-by-step guide for applied researchers, showing the utility of the developed SLiDApp application. Using data from the Open-Source Psychometrics Project (N = 2495), we conducted a bi-factor ESEM exploration of the Generic Conspiracist Beliefs Scale. We studied whether bi-factor modelling was appropriate and if both general and group factors were related to each personality trait. Results: The application of the SLiD algorithm provided unique information regarding this factor structure and its ESEM structural parameters. Conclusions: The results illustrated the usefulness and validity of SLiD-based bi-factor ESEM, and how the proposed Shiny app could make it eaiser for applied researchers to use these methodsThis work was supported by Grant FPU15/ 03246 from the Ministry of Education, Culture and Sports (Spain), by the Cátedra de Modelos y Aplicaciones Psicométricos (Instituto de Ingeniería del Conocimiento and Autonomous University of Madrid) and by Grant PSI2017-85022-P from Ministerio de Economia y Competitividad (Spain) and Fondo Social Europe
    corecore