6,468 research outputs found
Recommended from our members
Exploring parameter sensitivities of the land surface using a locally coupled land-atmosphere model
This paper presents a multicriteria analysis that explores the sensitivity of the land surface to changes in both land and atmospheric parameters, in terms of reproducing surface heat fluxes and ground temperature; for the land parameters, offline sensitivity analyses were also conducted for comparison to infer the influence of land-atmosphere interactions. A simple "one-at-a-time" sensitivity analysis was conducted first to filter out some insensitive parameters, followed by a multicriteria sensitivity analysis using the multiobjective generalized sensitivity analysis algorithm. The models used were the locally coupled National Center for Atmospheric Research (NCAR) single-column community climate model and the offline NCAR land surface model, driven and evaluated by a summer intensive operational periods (IOP) data set from the southern Great Plains. As expected, the results show that land-atmosphere interactions (with or without land-atmosphere parameter interactions) can have significant influences on the sensitivity of the land surface to changes in the land parameters, and the single-criterion sensitivities can be significantly different from the multicriteria sensitivity. These findings are mostly model and data independent and can be generally useful, regardless of the model/data dependence of the sensitivities of individual parameters. The exceptionally high sensitivities of the selected atmospheric parameters in a multicriteria sense (and in particular for latent heat) appeal for adequate attention to the specification of effective values of these parameters in an atmospheric model. Overall, this study proposes an effective framework of multicriteria sensitivity analysis beneficial to future studies in the development and parameter estimation of other complex (offline or coupled) land surface models. Copyright 2004 by the American Geophysical Union
Combining Rasch and cluster analysis: a novel method for developing rheumatoid arthritis states for use in valuation studies
Purpose: Health states that describe an investigated condition are a crucial component of valuation studies. The health states need to be distinct, comprehensible, and data-driven. The objective of this study was to describe a novel application of Rasch and cluster analyses in the development of three rheumatoid arthritis health states.
Methods: The Stanford Health Assessment Questionnaire (HAQ) was subjected to Rasch analysis to select the items that best represent disability. K-means cluster analysis produced health states with the levels of the selected items. The pain and discomfort domain from the EuroQol-5D was incorporated at the final stage.
Results: The results demonstrate a methodology for reducing a dataset containing individual disease-specific scores to generate health states. The four selected HAQ items were bending down, climbing steps, lifting a cup to your mouth, and standing up from a chair.
Conclusions: Overall, the combined use of Rasch and cluster analysis has proved to be an effective technique for identifying the most important items and levels for the construction of health states
Combining Rasch and cluster analysis: a novel method for developing rheumatoid arthritis states for use in valuation studies
Purpose: Health states that describe an investigated condition are a crucial component of valuation studies. The health states need to be distinct, comprehensible, and data-driven. The objective of this study was to describe a novel application of Rasch and cluster analyses in the development of three rheumatoid arthritis health states. Methods: The Stanford Health Assessment Questionnaire (HAQ) was subjected to Rasch analysis to select the items that best represent disability. K-means cluster analysis produced health states with the levels of the selected items. The pain and discomfort domain from the EuroQol-5D was incorporated at the final stage. Results: The results demonstrate a methodology for reducing a dataset containing individual disease-specific scores to generate health states. The four selected HAQ items were bending down, climbing steps, lifting a cup to your mouth, and standing up from a chair. Conclusions: Overall, the combined use of Rasch and cluster analysis has proved to be an effective technique for identifying the most important items and levels for the construction of health states.health state; Rasch analysis; cluster analysis; quality of life; rheumatoid arthritis
A Penalty Approach to Differential Item Functioning in Rasch Models
A new diagnostic tool for the identification of differential item functioning (DIF) is proposed. Classical approaches to DIF allow to consider only few subpopulations like ethnic groups when investigating if the solution of items depends on the membership to a subpopulation. We propose an explicit model for differential item functioning that includes a set of variables, containing metric as well as categorical components, as potential candidates for inducing DIF. The ability to include a set of covariates entails that the model contains a large number of parameters. Regularized estimators, in particular penalized maximum likelihood estimators, are used
to solve the estimation problem and to identify the items that induce DIF. It is shown that the method is able to detect items with DIF. Simulations and two applications demonstrate the applicability of the method
An Examination of Parameter Recovery Using Different Multiple Matrix Booklet Designs
Educational large-scale assessments examine students’ achievement in various content
domains and thus provide key findings to inform educational research and evidence-based
educational policies. To this end, large-scale assessments involve hundreds of items to test
students’ achievement in various content domains. Administering all these items to single
students will over-burden them, reduce participation rates, and consume too much time and
resources. Hence multiple matrix sampling is used in which the test items are distributed into
various test forms called “booklets”; and each student administered a booklet, containing a
subset of items that can sensibly be answered during the allotted test timeframe. However,
there are numerous possibilities as to how these booklets can be designed, and this manner of booklet design could influence parameter recovery precision both at global and subpopulation levels. One popular booklet design with many desirable characteristics is the
Balanced Incomplete 7-Block or Youden squares design. Extensions of this booklet design
are used in many large-scale assessments like TIMSS and PISA. This doctoral project
examines the degree to which item and population parameters are recovered in real and
simulated data in relation to matrix sparseness, when using various balanced incomplete
block booklet designs. To this end, key factors (e.g., number of items, number of persons,
number of items per person, and the match between the distributions of item and person
parameters) are experimentally manipulated to learn how these factors affect the precision
with which these designs recover true population parameters. In doing so, the project expands
the empirical knowledge base on the statistical properties of booklet designs, which in turn
could help improve the design of future large-scale studies.
Generally, the results show that for a typical large-scale assessment (with a sample size of at
least 3,000 students and more than 100 test items), population and item parameters are recovered accurately and without bias in the various multi-matrix booklet designs. This is
true both at the global population level and at the subgroup or sub-population levels. Further,
for such a large-scale assessment, the match between the distribution of person abilities and
the distribution of item difficulties is found to have an insignificant effect on the precision
with which person and item parameters are recovered, when using these multi-matrix booklet
designs.
These results give further support to the use of multi-matrix booklet designs as a reliable test
abridgment technique in large-scale assessments, and for accurate measurement of
performance gaps between policy-relevant subgroups within populations. However, item position effects were not fully considered, and different results are possible if similar studies
are performed (a) with conditions involving items that poorly measure student abilities (e.g.,
with students having skewed ability distributions); or, (b) simulating conditions where there
is a lot of missing data because of non-response, instead of just missing by design. This
should be further investigated in future studies.Die Erfassung des Leistungsstands von Schülerinnen und Schülern in verschiedenen
Domänen durch groß angelegte Schulleistungsstudien (sog. Large-Scale Assessments) liefert
wichtige Erkenntnisse für die Bildungsforschung und die evidenzbasierte Bildungspolitik.
Jedoch erfordert die Leistungstestung in vielen Themenbereichen auch immer den Einsatz
hunderter Items. Würden alle Testaufgaben jeder einzelnen Schülerin bzw. jedem einzelnen
Schüler vorgelegt werden, würde dies eine zu große Belastung für die Schülerinnen und
Schüler darstellen und folglich wären diese auch weniger motiviert, alle Aufgaben zu
bearbeiten. Zudem wäre der Einsatz aller Aufgaben in der gesamten Stichprobe sehr zeit- und
ressourcenintensiv. Aus diesen Gründen wird in Large-Scale Assessments oft auf ein Multi-
Matrix Design zurückgegriffen bei dem verschiedene, den Testpersonen zufällig zugeordnete,
Testheftversionen (sog. Booklets) zum Einsatz kommen. Diese enthalten nicht alle Aufgaben,
sondern lediglich eine Teilmenge des Aufgabenpools, wobei nur ein Teil der Items zwischen
den verschiedenen Booklets überlappt. Somit wird sichergestellt, dass die Schülerinnen und
Schüler alle ihnen vorgelegten Items in der vorgegebenen Testzeit bearbeiten können. Jedoch
gibt es zahlreiche Varianten wie diese Booklets zusammengestellt werden können. Das
jeweilige Booklet Design hat wiederum Auswirkungen auf die Genauigkeit der
Parameterschätzung auf Populations- und Teilpopulationsebene. Ein bewährtes Booklet
Design ist das Balanced-Incomplete-7-Block Design, auch Youden-Squares Design genannt,
das in unterschiedlicher Form in vielen Large-Scale Assessments, wie z.B. TIMSS und PISA,
Anwendung findet. Die vorliegende Arbeit untersucht sowohl auf Basis realer als auch
simulierter Daten die Genauigkeit mit der Item- und Personenparameter unter Anwendung
verschiedener Balanced-Incomplete-Block Designs und in Abhängigkeit vom Anteil
designbedingt fehlender Werte geschätzt werden können. Dafür wurden verschiede
Designparameter variiert (z.B. Itemanzahl, Stichprobenumfang, Itemanzahl pro Booklet,
Ausmaß der Passung von Item- und Personenparametern) und anschließend analysiert, in
welcher Weise diese die Genauigkeit der Schätzung von Populationsparametern beeinflussen. Die vorliegende Arbeit hat somit zum Ziel, das empirische Wissen um die statistischen Eigenschaften von Booklet Designs zu erweitern, wodurch ein Beitrag zur Verbesserung zukünftiger Large-Scale Assessments geleistet wird.
Die Ergebnisse der vorliegenden Arbeit zeigten, dass für ein typisches Large-Scale
Assessment (mit einer Stichprobengröße von mindestens 3000 Schülerinnen und Schülern
und mindestens 100 Items) die Personen- und Itemparameter sowohl auf Populations- als
auch auf Teilpopulationsebene mit allen eingesetzten Varianten des Balanced-Incomplete-
Block Designs präzise geschätzt wurden. Außerdem konnte gezeigt werden, dass für
Stichproben mit mindestens 3000 Schülerinnen und Schülern die Passung zwischen der
Leistungsverteilung und der Verteilung der Aufgabenschwierigkeit keinen bedeutsamen
Einfluss auf die Genauigkeit hatte, mit der verschiedene Booklet Designs Personen- und
Itemparameter schätzten.
Die Ergebnisse untermauern, dass unter Verwendung von multi-matrix Designs
bildungspolitisch relevante Leistungsunterschiede zwischen Gruppen von Schülerinnen und
Schülern in der Population reliabel und präzise geschätzt werden können. Eine
Einschränkung der vorliegenden Studie liegt darin, dass Itempositionseffekte nicht umfassend
berücksichtigt wurden. So kann nicht ausgeschlossen werden, dass die Ergebnisse abweichen würden, wenn (a) Items verwendet werden würden, welche die Leistung der Schülerinnen und Schüler schlecht schätzen (z.B. bei einer schiefen Verteilungen der Leistungswerte) oder (b) hohe Anteile an fehlenden Werten vorliegen, die nicht durch das Multi-Matrix Design erzeugt wurden. Dies sollte in zukünftigen Studien untersucht werden
- …