45 research outputs found
Coefficients for tests from a decision theoretic point of view
From a decision theoretic point of view a general coefficient for tests, d, is derived. The coefficient is applied to three kinds of decision situations. First, the situation is considered in which a true score is estimated by a function of the observed score of a subject on a test (point estimation). Using the squared error loss function and Kelley’s formula for estimating the true score, it is shown that d equals the reliability coefficient from classical test theory. Second, the situation is considered in which the observed scores are split into more than two categories and different decisions are made for the categories (multiple decision). The general form of the coefficient is derived, and two loss functions suited to multiple decision situations are described. It is shown that for the loss function specifying constant losses for the various combinations of categories on the true and on the observed scores, the coefficient can be computed under the assumptions of the beta-binomial model. Third, the situation is considered in which the observed scores are split into only two categories and different decisions are made for each category (dichotomous decisions). Using a loss function that specifies constant losses for combinations of categories on the true and observed score and the assumption of an increasing regression function of t on x, it is shown that coefficient d equals Loevinger’s coefficient H between true and observed scores. The coefficient can be computed under the assumption of the beta-binomial model. Finally, it is shown that for a linear loss function and Kelley’s formula for the regression of the true score on the observed score, the coefficient equals the reliability coefficient of classical test theory
CAT for Personality Items Zeitschrift fĂĽr Psychologie
Abstract. A computerized adaptive testing (CAT) procedure was simulated with ordinal polytomous personality data collected using a conventional paper-and-pencil testing format. An adapted Dutch version of the dominance scale of Gough and Heilbrun's Adjective Check List (ACL) was used. This version contained Likert response scales with five categories. Item parameters were estimated using Samejima's graded response model from the responses of 1,925 subjects. The CAT procedure was simulated using the responses of 1,517 other subjects. The value of the required standard error in the stopping rule of the CAT was manipulated. The relationship between CAT latent trait estimates and estimates based on all dominance items was studied. Additionally, the pattern of relationships between the CAT latent trait estimates and the other ACL scales was compared to that between latent trait estimates based on the entire item pool and the other ACL scales. The CAT procedure resulted in latent trait estimates qualitatively equivalent to latent trait estimates based on all items, while a substantial reduction of the number of used items could be realized (at the stopping rule of 0.4 about 33% of the 36 items was used)
Educational measurement
The third edition of the volume Educational Measurement gives, as the previous two editions of Lindquist (1951) and Thorndike (1971), a comprehensive review of the state of art of educational measurement. The volume is edited and introduced by R.L. Linn and is organized in three parts:(1) Theory and General Principles (chapters 2 through 7), (2) Construction, Administration, and Scoring (chapters 8 through 11), and (3) Applications (chapters 12 through 18). More than half of the number of pages is devoted to theory and general principles and the emphasis of the review is also on this par
Conceptual notes on models for discrete polytomous item responses
The following types of discrete item responses are distinguished
: nominal-dichotomous, ordinal-dichotomous,
nominal-polytomous, and ordinal-polytomous. Bock
(1972) presented a model for nominal-polytomous item
responses that, when applied to dichotomous responses,
yields Birnbaum’s (1968) two-parameter logistic model.
Applying Bock’s model to ordinal-polytomous items
leads to a conceptual problem. The ordinal nature of
the response variable must be preserved; this can be
achieved using three different methods. A number of existing
models are derived using these three methods. The
structure of these models is similar, but they differ in the
interpretation and qualities of their parameters. Information, parameter invariance, log-odds differences invariance,
and model violation also are discussed. Information
and parameter invariance of dichotomous item
response theory (IRT) also apply to polytomous IRT.
Specific objectivity of the Rasch model for dichotomous
items is a special case of log-odds differences invariance
of polytomous items. Differential item functioning
of dichotomous IRT is a special case of measurement
model violation of polytomous IRT. Index terms: adjacent
categories, continuation ratios, cumulative probabilities,
differential item functioning, log-odds
differences invariance, measurement model violation,
parameter invariance, polytomous IRT models
Optimal cutting scores using a linear loss function
The situation is considered in which a total score on a test is used for classifying examinees into two categories: "accepted (with scores above a cutting score on the test) and "not accepted" (with scores below the cutting score). A value on the latent variable is fixed in advance; examinees above this value are "suitable" and those below are "not suitable." Using a linear loss function, a procedure is described for computing a cutting score that minimizes the risk for the decision rule. The procedure is demonstrated with a criterion-referenced achievement test of elementary statistics administered to 167 students
Optimal cutting scores using a linear loss function
The situation is considered in which a total score on a test is used for classifying examinees into two categories: "accepted (with scores above a cutting score on the test) and "not accepted" (with scores below the cutting score). A value on the latent variable is fixed in advance; examinees above this value are "suitable" and those below are "not suitable." Using a linear loss function, a procedure is described for computing a cutting score that minimizes the risk for the decision rule. The procedure is demonstrated with a criterion-referenced achievement test of elementary statistics administered to 167 students
The internal and external optimality of decisions based on tests
In applied measurement, test scores are usually transformed to decisions. Analogous to classical test theory, the reliability of decisions has been defined as the consistency of decisions on a test and a retest or on two parallel tests. Coefficient kappa (Cohen, 1960) is used for assessing the consistency of decisions. This coefficient has been developed for assessing agreement between nominal scales. It is argued that the coefficient is not suited for assessing consistency of decisions. Moreover, it is argued that the concept consistency of decisions is not appropriate for assessing the quality of a decision procedure. It is proposed that the concept consistency of decisions be replaced by the concept optimality of the decision procedure. Two types of optimality are distinguished. The internal optimality is the risk of the decision procedure with respect to the true score the test is measuring. The external optimality is the risk of the decision procedure with respect to an external criterion. For assessing the optimality of a decision procedure, coefficient delta (van der Linden & Mellenbergh, 1978), which can be considered a standardization of the Bayes risk or expected loss, can be used. Two loss functions are dealt with: the threshold and the linear loss functions. Assuming psychometric theory, coefficient delta for internal optimality can be computed from empirical data for both the threshold and the linear loss functions. The computation of coefficient delta for external optimality needs no assumption of psychometric theory. For six tests coefficient delta as an index for internal optimality is computed for both loss functions; the results are compared with coefficient kappa for assessing the consistency of decisions with the same tests
The Theoretical Status of Latent Variables
This article examines the theoretical status of latent variables as used in modern test theory models. First, it is argued that a consistent interpretation of such models requires a realist ontology for latent variables. Second, the relation between latent variables and their indicators is discussed. It is maintained that this relation can be interpreted as a causal one but that in measurement models for interindividual differences the relation does not apply to the level of the individual person. To substantiate intraindividual causal conclusions, one must explicitly represent individual level processes in the measurement model. Several research strategies that may be useful in this respect are discussed, and a typology of constructs is proposed on the basis of this analysis. The need to link individual processes to latent variable models for interindividual differences is emphasized. Consider the following sentence: “Einstein would not have been able to come up with his e � mc 2 had he not possessed such an extraordinary intelligence. ” What does this sentence express? It relates observable behavior (Einstein’s writing e � mc 2)toan unobservable attribute (his extraordinary intelligence), and it does so by assigning to the unobservable attribute a causal role i