12 research outputs found

    The effect of response order on candidate viewing behaviour and item difficulty in a multiple-choice listening test

    Get PDF
    Studies from various disciplines have reported that spatial location of options in relation to processing order impacts the ultimate choice of the option. A large number of studies have found a primacy effect, that is, the tendency to prefer the first option. In this paper we report on evidence that position of the key in four-option multiple-choice (MC) listening test items may affect item difficulty and thereby potentially introduce construct-irrelevant variance.Two sets of analyses were undertaken. With Study 1 we explored 30 test takers’ processing via eye-tracking on listening items from the Aptis Test. An unexpected finding concerned the amount of processing undertaken on different response options on the MC questions, given their order. Based on this, in Study 2 we looked at the direct effect of key position on item difficulty in a sample of 200 live Aptis items and around 6000 test takers per item.The results suggest that the spatial location of the key in MC listening tests affects the amount of processing it receives and the item’s difficulty. Given the widespread use of MC tasks in language assessments, these findings seem crucial, particularly for tests that randomize response order. Candidates who by chance have many keys in last position might be significantly disadvantaged

    Re-examining the content validation of a grammar test:the (im)possibility of distinguishing vocabulary and structural knowledge

    No full text
    “Vocabulary and structural knowledge” (Grabe, 1991, p. 379) appears to be a key component of reading ability. However, is this component to be taken as a unitary one or is structural knowledge a separate factor that can therefore also be tested in isolation in, say, a test of syntax? If syntax can be singled out (e.g. in order to investigate its contribution to reading ability), this test of syntactic knowledge would require validation. The usefulness and reliability of using expert judgments as a means of analysing the content or difficulty of test items in language assessment has been questioned for more than two decades. Still, groups of expert judges are often called upon as they are perceived to be the only or at least a very convenient way of establishing key features of items. Such judgments, however, are particularly opaque and thus problematic when judges are required to make categorizations where categories are only vaguely defined or are ontologically questionable in themselves. This is, for example, the case when judges are asked to classify the content of test items based on a distinction between lexis and syntax, a dichotomy corpus linguistics has suggested cannot be maintained. The present paper scrutinizes a study by Shiotsu (2010) that employed expert judgments, on the basis of which claims were made about the relative significance of the components ‘syntactic knowledge’ and ‘vocabulary knowledge’ in reading in a second language. By both replicating and partially replicating Shiotsu’s (2010) content analysis study, the paper problematizes not only the issue of the use of expert judgments, but, more importantly, their usefulness in distinguishing between construct components that might, in fact, be difficult to distinguish anyway. This is particularly important for an understanding and diagnosis of learners’ strengths and weaknesses in reading in a second language

    Moving the field of vocabulary assessment forward: The need for more rigorous test development and validation

    No full text
    Copyright © Cambridge University Press 2019. Recently, a large number of vocabulary tests have been made available to language teachers, testers, and researchers. Unfortunately, most of them have been launched with inadequate validation evidence. The field of language testing has become increasingly more rigorous in the area of test validation, but developers of vocabulary tests have generally not given validation sufficient attention in the past. This paper argues for more rigorous and systematic procedures for test development, starting from a more precise specification of the test's purpose, intended testees and educational context, the particular aspects of vocabulary knowledge which are being measured, and the way in which the test scores should be interpreted. It also calls for greater assessment literacy among vocabulary test developers, and greater support for the end users of the tests, for instance, with the provision of detailed users' manuals. Overall, the authors present what they feel are the minimum requirements for vocabulary test development and validation. They argue that the field should self-police itself more rigorously to ensure that these requirements are met or exceeded, and made explicit for those using vocabulary tests

    Looking into listening: Using eye-tracking to establish the cognitive validity of the Aptis Listening Test

    No full text
    This study investigated the cognitive processing of 30 test-takers while completing the Aptis Listening Test. The research studied test-takers’ processes according to ones targeted at the different itemlevels in the Aptis Test.Specifically, it examined whether test-takers’ cognitive processes and types of information used corresponded to the ones targeted at the different CEFR levels. To this end, a detailed analysis of test-takers’ verbal recalls was conducted, which were stimulated by a replay of their eye-traces while they had been solving the items. The study also explored the usefulness of quantitative analyses of eye-tracking metrics captured during listening tests.The stimulated recall findings indicate that the Aptis Listening Test successfully taps into the range of cognitive processes and types of information intended by the test developers. The data also shows, however, that the differences between the CEFR levels in relation to the intended cognitive processes could be more pronounced, and that the process of “discourse construction” could be more evident for B2 items. It is, therefore, suggested that a different item type could help elicit this type of higher-order processing. In terms of types of information used by candidates, a clear difference and progression between the CEFR levels to answer items correctly was observed.The quantitative analysis of the eye-tracking metrics revealed interesting results. A linear mixed effects model analysis, with visit duration on response options as the dependant variable, showed that testtakers looked at the response options of higher-level items significantly longer than at the response options of lower-level items. The results also showed that response options higher up on the screen were looked at significantly longer than response options lower down, regardless of item level. In addition, it was found that better readers focused on the response options significantly longer than poorer readers
    corecore