Search CORE

1,212 research outputs found

An Examination of Parameter Recovery Using Different Multiple Matrix Booklet Designs

Author: Anta Akuro
Publication venue
Publication date: 01/01/2020
Field of study

Educational large-scale assessments examine students’ achievement in various content domains and thus provide key findings to inform educational research and evidence-based educational policies. To this end, large-scale assessments involve hundreds of items to test students’ achievement in various content domains. Administering all these items to single students will over-burden them, reduce participation rates, and consume too much time and resources. Hence multiple matrix sampling is used in which the test items are distributed into various test forms called “booklets”; and each student administered a booklet, containing a subset of items that can sensibly be answered during the allotted test timeframe. However, there are numerous possibilities as to how these booklets can be designed, and this manner of booklet design could influence parameter recovery precision both at global and subpopulation levels. One popular booklet design with many desirable characteristics is the Balanced Incomplete 7-Block or Youden squares design. Extensions of this booklet design are used in many large-scale assessments like TIMSS and PISA. This doctoral project examines the degree to which item and population parameters are recovered in real and simulated data in relation to matrix sparseness, when using various balanced incomplete block booklet designs. To this end, key factors (e.g., number of items, number of persons, number of items per person, and the match between the distributions of item and person parameters) are experimentally manipulated to learn how these factors affect the precision with which these designs recover true population parameters. In doing so, the project expands the empirical knowledge base on the statistical properties of booklet designs, which in turn could help improve the design of future large-scale studies. Generally, the results show that for a typical large-scale assessment (with a sample size of at least 3,000 students and more than 100 test items), population and item parameters are recovered accurately and without bias in the various multi-matrix booklet designs. This is true both at the global population level and at the subgroup or sub-population levels. Further, for such a large-scale assessment, the match between the distribution of person abilities and the distribution of item difficulties is found to have an insignificant effect on the precision with which person and item parameters are recovered, when using these multi-matrix booklet designs. These results give further support to the use of multi-matrix booklet designs as a reliable test abridgment technique in large-scale assessments, and for accurate measurement of performance gaps between policy-relevant subgroups within populations. However, item position effects were not fully considered, and different results are possible if similar studies are performed (a) with conditions involving items that poorly measure student abilities (e.g., with students having skewed ability distributions); or, (b) simulating conditions where there is a lot of missing data because of non-response, instead of just missing by design. This should be further investigated in future studies.Die Erfassung des Leistungsstands von Schülerinnen und Schülern in verschiedenen Domänen durch groß angelegte Schulleistungsstudien (sog. Large-Scale Assessments) liefert wichtige Erkenntnisse für die Bildungsforschung und die evidenzbasierte Bildungspolitik. Jedoch erfordert die Leistungstestung in vielen Themenbereichen auch immer den Einsatz hunderter Items. Würden alle Testaufgaben jeder einzelnen Schülerin bzw. jedem einzelnen Schüler vorgelegt werden, würde dies eine zu große Belastung für die Schülerinnen und Schüler darstellen und folglich wären diese auch weniger motiviert, alle Aufgaben zu bearbeiten. Zudem wäre der Einsatz aller Aufgaben in der gesamten Stichprobe sehr zeit- und ressourcenintensiv. Aus diesen Gründen wird in Large-Scale Assessments oft auf ein Multi- Matrix Design zurückgegriffen bei dem verschiedene, den Testpersonen zufällig zugeordnete, Testheftversionen (sog. Booklets) zum Einsatz kommen. Diese enthalten nicht alle Aufgaben, sondern lediglich eine Teilmenge des Aufgabenpools, wobei nur ein Teil der Items zwischen den verschiedenen Booklets überlappt. Somit wird sichergestellt, dass die Schülerinnen und Schüler alle ihnen vorgelegten Items in der vorgegebenen Testzeit bearbeiten können. Jedoch gibt es zahlreiche Varianten wie diese Booklets zusammengestellt werden können. Das jeweilige Booklet Design hat wiederum Auswirkungen auf die Genauigkeit der Parameterschätzung auf Populations- und Teilpopulationsebene. Ein bewährtes Booklet Design ist das Balanced-Incomplete-7-Block Design, auch Youden-Squares Design genannt, das in unterschiedlicher Form in vielen Large-Scale Assessments, wie z.B. TIMSS und PISA, Anwendung findet. Die vorliegende Arbeit untersucht sowohl auf Basis realer als auch simulierter Daten die Genauigkeit mit der Item- und Personenparameter unter Anwendung verschiedener Balanced-Incomplete-Block Designs und in Abhängigkeit vom Anteil designbedingt fehlender Werte geschätzt werden können. Dafür wurden verschiede Designparameter variiert (z.B. Itemanzahl, Stichprobenumfang, Itemanzahl pro Booklet, Ausmaß der Passung von Item- und Personenparametern) und anschließend analysiert, in welcher Weise diese die Genauigkeit der Schätzung von Populationsparametern beeinflussen. Die vorliegende Arbeit hat somit zum Ziel, das empirische Wissen um die statistischen Eigenschaften von Booklet Designs zu erweitern, wodurch ein Beitrag zur Verbesserung zukünftiger Large-Scale Assessments geleistet wird. Die Ergebnisse der vorliegenden Arbeit zeigten, dass für ein typisches Large-Scale Assessment (mit einer Stichprobengröße von mindestens 3000 Schülerinnen und Schülern und mindestens 100 Items) die Personen- und Itemparameter sowohl auf Populations- als auch auf Teilpopulationsebene mit allen eingesetzten Varianten des Balanced-Incomplete- Block Designs präzise geschätzt wurden. Außerdem konnte gezeigt werden, dass für Stichproben mit mindestens 3000 Schülerinnen und Schülern die Passung zwischen der Leistungsverteilung und der Verteilung der Aufgabenschwierigkeit keinen bedeutsamen Einfluss auf die Genauigkeit hatte, mit der verschiedene Booklet Designs Personen- und Itemparameter schätzten. Die Ergebnisse untermauern, dass unter Verwendung von multi-matrix Designs bildungspolitisch relevante Leistungsunterschiede zwischen Gruppen von Schülerinnen und Schülern in der Population reliabel und präzise geschätzt werden können. Eine Einschränkung der vorliegenden Studie liegt darin, dass Itempositionseffekte nicht umfassend berücksichtigt wurden. So kann nicht ausgeschlossen werden, dass die Ergebnisse abweichen würden, wenn (a) Items verwendet werden würden, welche die Leistung der Schülerinnen und Schüler schlecht schätzen (z.B. bei einer schiefen Verteilungen der Leistungswerte) oder (b) hohe Anteile an fehlenden Werten vorliegen, die nicht durch das Multi-Matrix Design erzeugt wurden. Dies sollte in zukünftigen Studien untersucht werden

Institutional Repository of the Freie Universität Berlin

A governance framework for algorithmic accountability and transparency

Author: Clifton Chris
Hatada Yohko
Koene Ansgar
Richardson Rashida
Webb Helena
Publication venue: European Parliamentary Research Service
Publication date: 12/06/2019
Field of study

Algorithmic systems are increasingly being used as part of decision-making processes in both the public and private sectors, with potentially significant consequences for individuals, organisations and societies as a whole. Algorithmic systems in this context refer to the combination of algorithms, data and the interface process that together determine the outcomes that affect end users. Many types of decisions can be made faster and more efficiently using algorithms. A significant factor in the adoption of algorithmic systems for decision-making is their capacity to process large amounts of varied data sets (i.e. big data), which can be paired with machine learning methods in order to infer statistical models directly from the data. The same properties of scale, complexity and autonomous model inference however are linked to increasing concerns that many of these systems are opaque to the people affected by their use and lack clear explanations for the decisions they make. This lack of transparency risks undermining meaningful scrutiny and accountability, which is a significant concern when these systems are applied as part of decision-making processes that can have a considerable impact on people's human rights (e.g. critical safety decisions in autonomous vehicles; allocation of health and social service resources, etc.). This study develops policy options for the governance of algorithmic transparency and accountability, based on an analysis of the social, technical and regulatory challenges posed by algorithmic systems. Based on a review and analysis of existing proposals for governance of algorithmic systems, a set of four policy options are proposed, each of which addresses a different aspect of algorithmic transparency and accountability: 1. awareness raising: education, watchdogs and whistleblowers; 2. accountability in public-sector use of algorithmic decision-making; 3. regulatory oversight and legal liability; and 4. global coordination for algorithmic governance

Repository@Nottingham

Determination and evaluation of clinically efficient stopping criteria for the multiple auditory steady-state response technique

Author: D'Haenens Wendy
Dhooge Ingeborg
Vinck Bart
Publication venue
Publication date: 01/01/2009
Field of study

Background: Although the auditory steady-state response (ASSR) technique utilizes objective statistical detection algorithms to estimate behavioural hearing thresholds, the audiologist still has to decide when to terminate ASSR recordings introducing once more a certain degree of subjectivity. Aims: The present study aimed at establishing clinically efficient stopping criteria for a multiple 80-Hz ASSR system. Methods: In Experiment 1, data of 31 normal hearing subjects were analyzed off-line to propose stopping rules. Consequently, ASSR recordings will be stopped when (1) all 8 responses reach significance and significance can be maintained for 8 consecutive sweeps; (2) the mean noise levels were ≤ 4 nV (if at this “≤ 4-nV” criterion, p-values were between 0.05 and 0.1, measurements were extended only once by 8 sweeps); and (3) a maximum amount of 48 sweeps was attained. In Experiment 2, these stopping criteria were applied on 10 normal hearing and 10 hearing-impaired adults to asses the efficiency. Results: The application of these stopping rules resulted in ASSR threshold values that were comparable to other multiple-ASSR research with normal hearing and hearing-impaired adults. Furthermore, in 80% of the cases, ASSR thresholds could be obtained within a time-frame of 1 hour. Investigating the significant response-amplitudes of the hearing-impaired adults through cumulative curves indicated that probably a higher noise-stop criterion than “≤ 4 nV” can be used. Conclusions: The proposed stopping rules can be used in adults to determine accurate ASSR thresholds within an acceptable time-frame of about 1 hour. However, additional research with infants and adults with varying degrees and configurations of hearing loss is needed to optimize these criteria

Ghent University Academic Bibliography

11th Annual Undergraduate Research Symposium

Author: Office of Undergraduate Research
Publication venue: UWM Digital Commons
Publication date: 05/04/2019
Field of study

University of Wisconsin-Milwaukee

The Cost-Utility of Measles-Mumps-Rubella Immunization Strategies During a Mumps Outbreak

Author: Anyiwe Kikanwa
Baclic Oliver
Dubey Vinita
Naimark David
Salvadori Marina
Sander Beate
Tunis Matthew
Yeung Man Wah
Publication venue: Scholarship@Western
Publication date: 01/01/2020
Field of study

Scholarship@Western

Reliability and validity of PROMIS measures administered by telephone interview in a longitudinal localized prostate cancer study

Author: Chen R.C.
Emerson M.A.
Langer M.M.
Quach C.W.
Reeve B.B.
Thissen D.
Usinger D.S.
Publication venue: Springer International Publishing
Publication date: 01/01/2016
Field of study

Purpose: To evaluate the reliability and validity of six PROMIS measures (anxiety, depression, fatigue, pain interference, physical function, and sleep disturbance) telephone-administered to a diverse, population-based cohort of localized prostate cancer patients. Methods: Newly diagnosed men were enrolled in the North Carolina Prostate Cancer Comparative Effectiveness and Survivorship Study. PROMIS measures were telephone-administered pre-treatment (baseline), and at 3-months and 12-months post-treatment initiation (N = 778). Reliability was evaluated using Cronbach’s alpha. Dimensionality was examined with bifactor models and explained common variance (ECV). Ordinal logistic regression models were used to detect potential differential item functioning (DIF) for key demographic groups. Convergent and discriminant validity were assessed by correlations with the legacy instruments Memorial Anxiety Scale for Prostate Cancer and SF-12v2. Known-groups validity was examined by age, race/ethnicity, comorbidity, and treatment. Results: Each PROMIS measure had high Cronbach’s alpha values (0.86–0.96) and was sufficiently unidimensional. Floor effects were observed for anxiety, depression, and pain interference measures; ceiling effects were observed for physical function. No DIF was detected. Convergent validity was established with moderate to strong correlations between PROMIS and legacy measures (0.41–0.77) of similar constructs. Discriminant validity was demonstrated with weak correlations between measures of dissimilar domains (−0.20–−0.31). PROMIS measures detected differences across age, race/ethnicity, and comorbidity groups; no differences were found by treatment. Conclusions: This study provides support for the reliability and construct validity of six PROMIS measures in prostate cancer, as well as the utility of telephone administration for assessing HRQoL in low literacy and hard-to-reach populations

Carolina Digital Repository

Requires improvement : urgent change for 11–16 education

Author
Publication venue: House of Lords
Publication date: 01/01/2023
Field of study

Digital Education Resource Archive

Others\u27 publications about EHDI

Author
Publication venue: DigitalCommons@USU
Publication date: 21/05/2020
Field of study

DigitalCommons@USU

The Use of ICT for the Assessment of Key Competences

Author: REDECKER CHRISTINE
Publication venue: Publications Office of the European Union
Publication date: 29/11/2012
Field of study

This report assesses current trends in the area of ICT for learning and assessment in view of their value for supporting the assessment of Key Competences. Based on an extensive review of the literature, it provides an overview of current ICT-enabled assessment practices, with a particular focus on more recent developments that support the holistic assessment of Key Competences for Lifelong Learning in Europe. The report presents a number of relevant cases, discusses the potential of emerging technologies, and addresses innovation and policy issues for eAssessment. It considers both summative and formative assessment and considers how ICT can lever the potential of more innovative assessment formats, such as peer-assessment and portfolio assessment and how more recent technological developments, such as Learning Analytics, could, in the future, foster assessment for learning. Reflecting on the use of the different ICT tools and services for each of the eight different Key Competences for Lifelong Learning it derives policy options for further exploiting the potential of ICT for competence-based assessment.JRC.J.3-Information Societ

JRC Publications Repository