Search CORE

2 research outputs found

Explaining Differential Item Functions with different testing cultures

Author: Jurecka Astrid
Publication venue
Publication date: 19/11/2010
Field of study

Der Fokus der Dissertation liegt auf der Betrachtung von Items zur Messung von fremdsprachlichem Leseverständnis in verschiedenen europäischen Ländern. Insbesondere wird der Einfluss unterschiedlicher Testkulturen auf die internationale Vergleichbarkeit und Validität dieser Items analysiert. Haupthypothese der Arbeit ist, dass sich Differentielle Item Funktionen (DIF, z.B. Holland & Wainer, 1993), eine durch Gruppenzugehörigkeit verursachte Varianz der Itemschwierigkeit, durch unterschiedliche Profile von Stärken und Schwächen von Gruppen im Hinblick auf sprachliche Teilaspekte vorhersagen lassen sollten. Dem liegt die Annahme zugrunde, dass in unterschiedlichen Bildungskulturen unterschiedliche Schwerpunkte bezüglich der unterrichteten sprachlichen Teilaspekte existieren. Ferner wird angenommen, dass sich dies auf den in einem Land konstruierten Testitems abbilden sollte, indem bestimmte schwierigkeitsbestimmende Charakteristika eines Items (z.B. Schwierigkeit von Vokabular/ Grammatik) mehr oder weniger häufig bei der Itemkonstruktion verwendet wurden. Signifikante Unterschiede dahingehend bei in unterschiedlichen Ländern konstruierten Items sollte demzufolge auf unterschiedliche testkulturelle Profile von Ländern hinweisen. Die Hauptfragestellung der Arbeit lautet: „Existiert ein Zusammenhang zwischen Differentiellen Item Funktionen und Indikatoren nationaler Testkulturen bei Aufgaben zur Messung des fremdsprachlichen Leseverständnissen in englischer und deutscher Sprache?“ Die Analysen wurden am Datensatz der europäischen EBAFLS-Studie (European Bank of Anchor Items for Foreign Language Skills; Fandel et al., 2007) durchgeführt. Im Rahmen der Studie wurden Daten an ca. 10.500 Schülern der 9.-11. Klasse in acht europäischen Ländern in den Sprachen Englisch, Deutsch und Französisch erhoben; die verwendeten Testitems stammten aus den verschiedenen Teilnehmerländern. Experten ordneten die Items hinsichtlich der verschiedenen Itemcharakteristika mit Hilfe des Kategorisierungsinstruments „Dutch Grid“ (Alderson et al., 2006) ein. Für diese Dissertation wurden die EBAFLS Items und Datensätze zur Messung des fremdsprachlichen Leseverständnissen für Englisch (Länder: Frankreich, Deutschland, Spanien, Ungarn) und Deutsch (Länder: Frankreich, Niederlande, Ungarn, Schweden) verwendet. In einem ersten Auswertungsschritt wurden zunächst für die Analysen notwendige Voraussetzungen überprüft: die Rasch-Skalierbarkeit der Items innerhalb der Länder, die Anzahl signifikanter DIF-Parameter zwischen den einzelnen Länderpaarungen, sowie das Vorhandensein unterschiedlicher testkultureller Profile der Teilnehmerländer im Sinne unterschiedlich häufig vorkommender Itemcharakteristika bei den in den unterschiedlichen Ländern konstruierten Items. Es zeigte sich, dass diese Voraussetzungen als gegeben angesehen werden konnten. Des Weiteren wurden anhand der unterschiedlichen testkulturellen Profile Hypothesen hinsichtlich der zu erwartenden Stärken und Schwächen der einzelnen Länder im Hinblick auf Items mit bestimmten kognitiv-linguistischen Charakteristika formuliert. Im zweiten Analyseschritt zeigten sich innerhalb aller Länder korrelative Zusammenhänge zwischen den ausgewählten Itemcharakteristika und der Itemschwierigkeit. Somit konnten die Itemcharakteristika auch für weitere Analysen, d.h. zur Erklärung von Unterschieden der Itemschwierigkeiten zwischen den Ländern, verwendet werden. Im dritten Auswertungsschritt wurden zunächst paarweise DIF-Parameter (zw. jeweils 2 Ländern) berechnet. Zur Analyse von Zusammenhängen zwischen DIF und den Itemcharakteristika wurden dann die Itemcharakteristika mit den DIF-Parametern korreliert sowie im Rahmen einer multiplen Regression als Prädiktoren für DIF eingesetzt. Die Korrelationen lagen zwischen r = -.47 und r = .47. Dabei bedeutet eine negativer Zusammenhang, dass dieses Item aus Sicht der Fokusgruppe eine im Vergleich zur Referenzgruppe niedrigere Itemschwierigkeit aufweist, und somit ein Zusammenhang zwischen einem Itemmerkmal und für diese Gruppe vorteilhaften DIF besteht, bzw. umgekehrt. In einem letzten Schritt wurde überprüft, inwieweit die Richtung der gefundenen Zusammenhänge mit den testkulturellen Profilen einhergeht. Kommt ein Itemmerkmal bei den Items der Fokusgruppe signifikant häufiger vor als beiden Items der Referenzgruppe, sollte dies mit einem negativen, d.h. vorteilhaften, Zusammenhang zwischen diesem Itemmerkmal und DIF einhergehen und umgekehrt. Es zeigte sich, dass 23 von 29 (Englisch) bzw. 25 von 34 (Deutsch) signifikanten Korrelationen ihrer Richtung nach den aufgrund der Testkultur aufgestellten Hypothesen entsprachen. Ferner konnte zwischen 21% und 49% der DIF-Varianz anhand von Prädiktoren erklärt werden, die ihrer Richtung nach den aufgrund der testkulturellen Profile gemachten Annahmen entsprachen. Die Hauptannahme, dass ein Zusammenhang zwischen Differentiellen Item Funktionen und Testkulturen existiert, konnte somit insgesamt beibehalten werden.Since more and more cross-national Large-Scale Studies are being conducted within the educational context, comparing the results in a culturally fair way and keeping an extra eye on validity and fairness becomes crucial. This is often done by analysing the test items for Differential Item Functioning (DIF). If an item shows DIF, this means that students from different countries have different probabilities of answering an item correct, although they are on the same ability level. In this case, the difference in item difficulty is only due to group membership and not to true ability differences. Within this dissertation, I am trying to explain DIF with different underlying testing cultures in different countries. It is hypothesized that test items partly reflect the testing culture of the country where they were constructed. Especially in cross-national comparison studies, test items are usually constructed in the different partaking countries. That different testing cultures might be one source of differential item functioning was already supported by different studies (e.g. Klieme & Baumert, 2001; Artelt & Baumert 2004). This dissertation therefore deals with the question whether DIF can at least partly be explained by different testing cultures. The data used for the analyses was taken from the European Large Scale Study EBAFLS (European Bank of Anchor items for Foreign Language skills; e.g. Fandel et al., 2007); the items were taken from the same study and were constructed within the participating countries. Additionally, all EFL and GFL items were judged with regard to cognitive-linguistic item characteristics (itemtype, location of information, authenticity, abstractness, vocabulary, grammar) by language experts. For this dissertation, two samples of the EBAFLS study have been analysed: one for German as FL (students from France, Netherlands, Hungary, Sweden; n = 3170), and one for English as FL (students from France, Germany, Hungary, Spain; n = 4204). Depending on the country of origin, students were between 15 and 17 years old. In a first step, to determine characteristics of testing cultures, it was analysed whether the items of the different countries differ with regard to frequencies or levels of difficulty-determining item-characteristics such as difficulty of grammar, difficulty of vocabulary, or item type. It was assumed that the more often a certain characteristic appears within the items of a country, the more this belongs to the testing culture of that group. Therefore, the presence of this characteristic should make an item easier for the respective group and vice versa. Results indicated that items which were constructed in different countries also differed significantly concerning the frequency of certain difficulty-determining item characteristics, which was interpreted as different testing cultures. In a second step, multiple regression analyses with item difficulty as dependent variable and item characteristics as predictors were performed within each country. Item difficulties within each country were estimated within a 1PL-IRT Model.Results showed that item characteristics explained 25 % - 39% of variance of the item difficulty for EFL items, and 43% - 58% of variance for GFL items. Therefore, item characteristics could be used for further analyses. In a third step, DIF parameters were estimated for all pairs of countries per language by using a one-parameter IRT-model. Subsequently, correlations between DIF and item characteristics were computed for each pair of countries. Furthermore, a multiple linear regression was applied to explain DIF with cognitive-linguistic item characteristics as predictors. Significant correlations between item characteristics and DIF ranged from r = -.47 to r = .47. A negative correlation indicates that the item is less difficult (item difficulty is lower) for the focus group than for the reference group, and vice versa. In a last step, it was analysed whether significant correlations and predictors point to the expected direction, i.e. make an item more or less difficult for students depending on whether the item characteristic plays an important role within the respective testing culture or not. It was hypothesized that item characteristics which are important within a certain testing culture should make the item easier for those students compared to a group where this is not the case, and vice versa. Most of the significant correlations (English: 23 of 29; German: 25 of 34) pointed to the hypothesized direction, i.e. made an item more or less easy for students depending on whether the item characteristic was an important part of their testing culture or not. Furthermore, between 21% and 49% of DIF Variance could be explained by predictors that were consistent with the different testing cultures. The results support the assumption that DIF can at least partly be explained by different testing cultures

Prinzipien kohärenter Kommunikation

Author: Grommes Patrick
Publication venue: Humboldt-Universität zu Berlin, Philosophische Fakultät II
Publication date: 04/05/2007
Field of study

Die Dissertation zeigt, dass die Prinzipien kohärenter Kommunikation auf psycholinguistisch begründete Prinzipien der Textproduktion zurückzuführen sind. Die gemeinsame Basis von Text- und Dialogproduktion ist die Quaestio, die als leitende Frage Vorgaben für den Aufbau eines Textes, aber auch einer einzelnen Äußerung macht. Im Text sichert die Quaestio Kohärenz, indem die Textbausteine auf der konzeptuellen Ebene – und nicht allein durch lexikalische oder grammatische Mittel – verknüpft werden. Das bedeutet, dass Kohärenzherstellung eine kognitive Leistung ist, die nicht allein rezeptiv zu erbringen ist, sondern auch bei der Sprachproduktion die Beachtung von Planungsvorgaben verlangt. Zunächst werden die Begriffe Kohärenz und Kohäsion und verschiedene Ansätze zu ihrer Beschreibung diskutiert. Außerdem werden Methoden der Dialoganalyse einander gegenüber gestellt. In dieser Diskussion werden unter anderem Rhetorical Structure Theory und Centering-Theorie behandelt. Da die Arbeit eher strukturelle mit qualitativen Analysen verbindet, werden methodische Zugänge zur Dialoganalyse wie die Konversationsanalyse, aber auch Clark’s sozialpsychologischer Ansatz der joint actions und joint activities sowie Pickering’s und Garrod’s alignment-Theorie aufgegriffen. Letztlich wird auf das Quaestio-Modell von Stutterheim zurückgegriffen, da es aus psycholinguistischer Perspektive den weitesten Erklärungsrahmen bietet. Der Hauptteil der Dissertation ist der Modell-Entwicklung anhand authentischer Gesprächsdaten gewidmet. Schließlich werden Prinzipien der Quaestio-Bearbeitung im Dialog entwickelt. Da unterschiedliche Gesprächssituationen untersucht werden, liefert diese Arbeit ein Inventar an Kohärenzprinzipien samt ihrer charakteristischen Merkmale, das nicht nur die Analyse beliebiger weiterer Gespräche erlaubt, sondern beispielsweise auch zur Entwicklung von Kommunikationsroutinen eingesetzt werden kann. So werden Anwendungsperspektiven psycholinguistischer Forschung erkennbar.This doctoral thesis shows in how far principles of coherent communication can be traced back to psycholinguistically founded principles of text production. The so-called quaestio forms the common basis of text and dialogue production. As an implicit underlying question it sets preferences for the structure of a whole text as well as a single utterance. The quaestio ensures coherence of texts on a conceptual basis rather than merely through the use of lexical or grammatical means. Thus, the production of coherence can be seen as cognitive achievement not only by listeners, but also by speakers who have to follow planning constraints. The thesis discusses the terms coherence and cohesion as well as descriptive approaches dealing with these terms. Additionally, methods of dialogue analysis are confronted with each other. This discussion treats for example Rhetorical Structure Theory and Centering-Theory. The thesis discusses diverse methodological approaches, because it combines structural with qualitative analyses. Thus approaches such as Conversation Analysis, Clark’s concept of joint actions and joint activities, but also Pickering’s and Garrod’s alignment-theory are being treated. In the end the quaestio approach by Stutterheim is chosen, because it offers the widest explanatory framework from a psycholinguistic point of view. The main part of the thesis is dedicated to detailed analyses of real-life dialogue. In conclusion, principles of quaestio management in dialogues are proposed. Because the study treats a wide variety of interaction settings, it delivers a set of principles of coherence and their typical features that allows not only for analyses of any other set of dialogues, but may also support the development of communication routines. Therefore, this thesis hints on application scenarios of psycholinguistic research