3,321 research outputs found
Item Response Theory for Peer Assessment
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve the reliability if the model parameters can be estimated accurately. However, when applying them to actual peer assessment, the parameter estimation accuracy would be reduced for the following reasons. 1) The number of rater parameters increases with two or more times the number of raters because the models include higher-dimensional rater parameters. 2) The accuracy of parameter estimation from sparse peer assessment data depends strongly on hand-tuning parameters, called hyperparameters. To solve these problems, this article presents a proposal of a new item response model for peer assessment that incorporates rater parameters to maintain as few rater parameters as possible. Furthermore, this article presents a proposal of a parameter estimation method using a hierarchical Bayes model for the proposed model that can learn the hyperparameters from data. Finally, this article describes the effectiveness of the proposed method using results obtained from a simulation and actual data experiments
評価者特性パラメータを付与した項目反応モデルに基づくパフォーマンス・テストの等化精度
近年,受験者の実践的かつ高次の能力を測定する手法の一つとしてパフォーマンス評価が注目されている.一方で,パフォーマンス評価の問題として,能力測定の精度が評価者とパフォーマンス課題の特性に強く依存する点が指摘されてきた.この問題を解決する手法として,近年,評価者と課題の特性を表すパラメータを付与した項目反応モデルが多数提案され,その有効性が示されている.他方,現実の評価場面では,複数回の異なるパフォーマンステストの結果を比較するニーズがしばしば生じる.このような場合に項目反応モデルを適用するためには,個々のテスト結果から推定されるモデルパラメータを同一尺度上に位置付ける「等化」が必要となる.一般に,パフォーマンステストの等化を行うためには,テスト間で課題と評価者の一部が共通するように個々のテストを設計する必要がある.このとき,等化の精度は,共通課題や共通評価者の数,各テストにおける受験者の能力特性分布,受験者数・評価者数・課題数などの様々な条件に依存すると考えられる.しかし,これまで,これらの要因が等化精度に与える影響は明らかにされておらず,テストをどのように設計すれば高精度な等化が可能となるかは示されてこなかった.そこで本研究では,項目反応モデルをパフォーマンス評価に適用して等化を行う場合に,その精度に影響を与える要因を実験により明らかにし,その結果に基づき,高い等化精度を達成するために必要なテストのデザインについて基準を示す.In various assessment contexts, performance assessment has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this problem, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. On the other hand, scores obtained from a performance test is often compared to those obtained from different tests practically. For that purpose, test equating, which is the statistical process of determining comparable scores on different forms, is required. To conduct the test equating, each test must be formed to have common raters and performance tasks. In this case, accuracy of the equating depends on various settings including the number of common raters and tasks, the ability distribution assumed in each tests, the number of examinees, rater and tasks. However, no relevant studies have examined what factors affect the equating accuracy. For that reason, the study evaluates the accuracy of performance test equating based on the IRT models while changing the test design. From the result, we show the factors affecting the equating accuracy and give some designs providing high equating accuracy
ピアアセスメントにおける異質評価者に頑健な項目反応理論
近年,MOOCsに代表される大規模eラーニングの普及に伴い,ピアアセスメントを学習者の能力測定に用いるニーズが高まっている.一方で,ピアアセスメントによる能力測定の課題として,その測定精度が評価者の特性に強く依存する問題が指摘されてきた.この問題を解決する手法の一つとして,評価者特性パラメータを付与した項目反応モデルが近年多数提案されている.しかし,既存モデルでは,評価基準が他の評価者と極端に異なる“異質評価者”の特性を必ずしも表現できないため,異質評価者が存在する可能性があるピアアセスメントに適用したとき能力測定精度が低下する問題が残る.この問題を解決するために,本論文では,1)評価の厳しさ,2)一貫性,3)尺度範囲の制限,に対応する評価者特性パラメータを付与した新たな項目反応モデルを提案する.提案モデルの利点は次のとおりである.1)評価者の特性を柔軟に表現できるため,異質評価者の採点データに対するモデルのあてはまりを改善できる.2)異質評価者の影響を正確に能力測定値に反映できるため,異質評価者が存在するピアアセスメントにおいて,既存モデルより高精度な能力測定が期待できる.本論文では,シミュレーション実験と実データ実験から提案モデルの有効性を示す.Item response theory (IRT) model that incorporates rater characteristic parameters have recently been proposed to improve peer assessment accuracy. However, the assessment accuracy based on the models will be reduced when the number of aberrant raters increases because they can necessarily not capture those characteristics. To resolve the problem, we propose a new IRT model that incorporates rater characteristic parameters corresponding to severity, consistency, and range restriction. The proposed model has the following advantages. 1) The model fitting to aberrant raters\u27 data is expected to be improved because the proposed model can represent rater characteristics flexibly. 2) Peer assessment accuracy is expected to be improved even when aberrant raters exist because learner ability can be estimated as to reflect aberrant raters\u27 characteristics more accurately. Through simulation and actual data experiments, we demonstrate effectiveness of the proposed model
Empirical comparison of item response theory models with rater\u27s parameters
In various assessment contexts including entrance examinations, educational assessments, and personnel appraisal, performance assessment by raters has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this shortcoming, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. However, because various models with different rater and task parameters exist, it is difficult to understand each model\u27s features. Therefore, this study presents empirical comparisons of IRT models. Specifically, after reviewing and summarizing features of existing models, we compare their performance through simulation and actual data experiments
Radiocarbon dating of Fugendake Volcano in Unzen, SW Japan
This article presents new radiocarbon ages for the lavas, pyroclastic flow, and lahar deposits that originated from the Fugendake and Mayuyama volcanoes of the Younger Unzen Volcano, SW Japan. Nine charcoal samples were collected from the lavas and pyroclastic flow deposits, and 17 soil samples from the underlying volcanic-related products. This data set, together with previously published ages (thermoluminescence, K-Ar, fission track, and 14C), yielded new information about the timing of Late Pleistocene eruptions and an improved understanding of the evolution of the Fugendake and Mayuyama volcanoes. Fugendake Volcano started to build within the scar of Myokendake around 29 cal ka BP, and its eruption products spread over the flank of Myokendake. The remarkable eruptions of Fugendake Volcano included the lava and pyroclastic flow deposits around 22, 17, 12, and 4.5 cal ka BP. Subsequent historical eruptions occurred in AD 1663, 1792, and 1991–1995. Developed on the eastern extension of Fugendake Volcano, Mayuyama Volcano was active during the building stage of Fugendake at 4.5 cal ka BP. This study also identified a pumice eruption at ~10 ka and 2 volcanic-related lahar deposits around 1.6 and 0.7 ka, which need to be addressed in future research
<Research Notes>Folk Taxonomy in the Coral Sea Area : A Case of Shirobe,Miyako Is. in Okinawa Pref.
Group optimization to maximize peer assessment accuracy using item response theory and integer programming
With the wide spread of large-scale e-learning environments such as MOOCs, peer assessment has been popularly used to measure learner ability. When the number of learners increases, peer assessment is often conducted by dividing learners into multiple groups to reduce the learner\u27s assessment workload. However, in such cases, the peer assessment accuracy depends on the method of forming groups. To resolve that difficulty, this study proposes a group formation method to maximize peer assessment accuracy using item response theory and integer programming. Experimental results, however, have demonstrated that the proposed method does not present sufficiently higher accuracy than a random group formation method does. Therefore, this study further proposes an external rater assignment method that assigns a few outside-group raters to each learner after groups are formed using the proposed group formation method. Through results of simulation and actual data experiments, this study demonstrates that the proposed external rater assignment can substantially improve peer assessment accuracy
Zur Differenz zwischen Religion und Religiosität bei jungen Menschen. Ein Problemaufriss
Streib H. Zur Differenz zwischen Religion und Religiosität bei jungen Menschen. Ein Problemaufriss. In: Meier U, Kropac U, König K, eds. Zwischen Religion und Religiosität. Regensburg: Pustet; 2015: 27-40
- …
