42 research outputs found

    Item Response Theory for Peer Assessment

    Get PDF
    As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve the reliability if the model parameters can be estimated accurately. However, when applying them to actual peer assessment, the parameter estimation accuracy would be reduced for the following reasons. 1) The number of rater parameters increases with two or more times the number of raters because the models include higher-dimensional rater parameters. 2) The accuracy of parameter estimation from sparse peer assessment data depends strongly on hand-tuning parameters, called hyperparameters. To solve these problems, this article presents a proposal of a new item response model for peer assessment that incorporates rater parameters to maintain as few rater parameters as possible. Furthermore, this article presents a proposal of a parameter estimation method using a hierarchical Bayes model for the proposed model that can learn the hyperparameters from data. Finally, this article describes the effectiveness of the proposed method using results obtained from a simulation and actual data experiments

    評䟡者特性パラメヌタを付䞎した項目反応モデルに基づくパフォヌマンス・テストの等化粟床

    Get PDF
    近幎受隓者の実践的か぀高次の胜力を枬定する手法の䞀぀ずしおパフォヌマンス評䟡が泚目されおいる䞀方でパフォヌマンス評䟡の問題ずしお胜力枬定の粟床が評䟡者ずパフォヌマンス課題の特性に匷く䟝存する点が指摘されおきたこの問題を解決する手法ずしお近幎評䟡者ず課題の特性を衚すパラメヌタを付䞎した項目反応モデルが倚数提案されその有効性が瀺されおいる他方珟実の評䟡堎面では耇数回の異なるパフォヌマンステストの結果を比范するニヌズがしばしば生じるこのような堎合に項目反応モデルを適甚するためには個々のテスト結果から掚定されるモデルパラメヌタを同䞀尺床䞊に䜍眮付ける「等化」が必芁ずなる䞀般にパフォヌマンステストの等化を行うためにはテスト間で課題ず評䟡者の䞀郚が共通するように個々のテストを蚭蚈する必芁があるこのずき等化の粟床は共通課題や共通評䟡者の数各テストにおける受隓者の胜力特性分垃受隓者数・評䟡者数・課題数などの様々な条件に䟝存するず考えられるしかしこれたでこれらの芁因が等化粟床に䞎える圱響は明らかにされおおらずテストをどのように蚭蚈すれば高粟床な等化が可胜ずなるかは瀺されおこなかったそこで本研究では項目反応モデルをパフォヌマンス評䟡に適甚しお等化を行う堎合にその粟床に圱響を䞎える芁因を実隓により明らかにしその結果に基づき高い等化粟床を達成するために必芁なテストのデザむンに぀いお基準を瀺すIn various assessment contexts, performance assessment has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this problem, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. On the other hand, scores obtained from a performance test is often compared to those obtained from different tests practically. For that purpose, test equating, which is the statistical process of determining comparable scores on different forms, is required. To conduct the test equating, each test must be formed to have common raters and performance tasks. In this case, accuracy of the equating depends on various settings including the number of common raters and tasks, the ability distribution assumed in each tests, the number of examinees, rater and tasks. However, no relevant studies have examined what factors affect the equating accuracy. For that reason, the study evaluates the accuracy of performance test equating based on the IRT models while changing the test design. From the result, we show the factors affecting the equating accuracy and give some designs providing high equating accuracy

    Empirical comparison of item response theory models with rater\u27s parameters

    Get PDF
    In various assessment contexts including entrance examinations, educational assessments, and personnel appraisal, performance assessment by raters has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this shortcoming, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. However, because various models with different rater and task parameters exist, it is difficult to understand each model\u27s features. Therefore, this study presents empirical comparisons of IRT models. Specifically, after reviewing and summarizing features of existing models, we compare their performance through simulation and actual data experiments

    ピアアセスメントにおける異質評䟡者に頑健な項目反応理論

    Get PDF
    近幎MOOCsに代衚される倧芏暡eラヌニングの普及に䌎いピアアセスメントを孊習者の胜力枬定に甚いるニヌズが高たっおいる䞀方でピアアセスメントによる胜力枬定の課題ずしおその枬定粟床が評䟡者の特性に匷く䟝存する問題が指摘されおきたこの問題を解決する手法の䞀぀ずしお評䟡者特性パラメヌタを付䞎した項目反応モデルが近幎倚数提案されおいるしかし既存モデルでは評䟡基準が他の評䟡者ず極端に異なる“異質評䟡者”の特性を必ずしも衚珟できないため異質評䟡者が存圚する可胜性があるピアアセスメントに適甚したずき胜力枬定粟床が䜎䞋する問題が残るこの問題を解決するために本論文では1評䟡の厳しさ2䞀貫性3尺床範囲の制限に察応する評䟡者特性パラメヌタを付䞎した新たな項目反応モデルを提案する提案モデルの利点は次のずおりである1評䟡者の特性を柔軟に衚珟できるため異質評䟡者の採点デヌタに察するモデルのあおはたりを改善できる2異質評䟡者の圱響を正確に胜力枬定倀に反映できるため異質評䟡者が存圚するピアアセスメントにおいお既存モデルより高粟床な胜力枬定が期埅できる本論文ではシミュレヌション実隓ず実デヌタ実隓から提案モデルの有効性を瀺すItem response theory (IRT) model that incorporates rater characteristic parameters have recently been proposed to improve peer assessment accuracy. However, the assessment accuracy based on the models will be reduced when the number of aberrant raters increases because they can necessarily not capture those characteristics. To resolve the problem, we propose a new IRT model that incorporates rater characteristic parameters corresponding to severity, consistency, and range restriction. The proposed model has the following advantages. 1) The model fitting to aberrant raters\u27 data is expected to be improved because the proposed model can represent rater characteristics flexibly. 2) Peer assessment accuracy is expected to be improved even when aberrant raters exist because learner ability can be estimated as to reflect aberrant raters\u27 characteristics more accurately. Through simulation and actual data experiments, we demonstrate effectiveness of the proposed model

    Group optimization to maximize peer assessment accuracy using item response theory and integer programming

    Get PDF
    With the wide spread of large-scale e-learning environments such as MOOCs, peer assessment has been popularly used to measure learner ability. When the number of learners increases, peer assessment is often conducted by dividing learners into multiple groups to reduce the learner\u27s assessment workload. However, in such cases, the peer assessment accuracy depends on the method of forming groups. To resolve that difficulty, this study proposes a group formation method to maximize peer assessment accuracy using item response theory and integer programming. Experimental results, however, have demonstrated that the proposed method does not present sufficiently higher accuracy than a random group formation method does. Therefore, this study further proposes an external rater assignment method that assigns a few outside-group raters to each learner after groups are formed using the proposed group formation method. Through results of simulation and actual data experiments, this study demonstrates that the proposed external rater assignment can substantially improve peer assessment accuracy

    A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo

    Get PDF
    Performance assessments, in which raters assess examinee performance for given tasks, have a persistent difficulty in that ability measurement accuracy depends on rater characteristics. To address this problem, various item response theory (IRT) models that incorporate rater characteristic parameters have been proposed. Conventional models partially consider three typical rater characteristics: severity, consistency, and range restriction. Each are important to improve model fitting and ability measurement accuracy, especially when the diversity of raters increases. However, no models capable of simultaneously representing each have been proposed. One obstacle for developing such a complex model is the difficulty of parameter estimation. Maximum likelihood estimation, which is used in most conventional models, generally leads to unstable and inaccurate parameter estimations in complex models. Bayesian estimation is expected to provide more robust estimations. Although it incurs high computational costs, recent increases in computational capabilities and the development of efficient Markov chain Monte Carlo (MCMC) algorithms make its use feasible. We thus propose a new IRT model that can represent all three typical rater characteristics. The model is formulated as a generalization of the many-facet Rasch model. We also develop a Bayesian estimation method for the proposed model using No-U-Turn Hamiltonian Monte Carlo, a state-of-the-art MCMC algorithm. We demonstrate the effectiveness of the proposed method through simulation and actual data experiments

    Bayes factorを甚いたRAIアルゎリズムによる倧芏暡ベむゞアンネットワヌク孊習

    Get PDF
    挞近䞀臎性をも぀ベむゞアンネットワヌクの構造孊習はNP困難であるこれたで動的蚈画法やA*探玢敎数蚈画法による探玢アルゎリズムが開発されおきたが未だに60ノヌド皋床の構造孊習を限界ずし倧芏暡構造孊習の実珟のためには党く異なるアプロヌチの開発が急務である䞀方で因果モデルの研究分野では条件付き独立性テストCIテストず方向付けによる画期的に蚈算量を削枛した構造孊習アプロヌチが提案されおいるこのアプロヌチは制玄ベヌスアプロヌチず呌ばれRAIアルゎリズムが最も高粟床な最先端孊習法ずしお知られおいるしかしRAIアルゎリズムはCIテストに仮説怜定法たたは条件付き盞互情報量を甚いおいる前者の粟床は垰無仮説が正しい確率を衚すp倀ずナヌザが蚭定する有意氎準に䟝存するp倀はデヌタ数の増加により小さい倀を取り誀っお垰無仮説を棄华しおしたう問題が知られおいる䞀方で埌者の粟床はしきい倀の蚭定に匷く圱響するしたがっお挞近的に真の構造を孊習できる保蚌がない本論文では挞近䞀臎性を有するBayes factorを甚いたCIテストをRAIアルゎリズムに組み蟌むこれにより数癟ノヌドをも぀倧芏暡構造孊習を実珟する数皮類のベンチマヌクネットワヌクを甚いたシミュレヌション実隓により本手法の有意性を瀺すA score-based learning Bayesian networks is NP-hard. On the other hands, constraint-based approach, that can dynamically relaxes the computational cost, is applicable to learning huge Bayesian network structures. The approach uses conditional independence (CI) tests based on the conditional mutual information and statistical testings. However, those CI tests have no consistency. In this paper, we propose a new constraint-based learning method that uses the CI test based on the Bayes factor, which have consistency. The proposed method combines it to the RAI algorithm, that is a state-of-the-art algorithm of the constraint-based approach. The experimental result shows our proposed method provides empirically best performance

    ピアアセスメントにおける項目反応理論を甚いたグルヌプ構成最適化

    Get PDF
    近幎瀟䌚構成䞻矩に基づく孊習評䟡法ずしおピアアセスメントが泚目されおいる䞀般にMOOCsのように孊習者数が倚い堎合のピアアセスメントは評䟡の負担を軜枛するために孊習者を耇数のグルヌプに分割しおグルヌプ内のメンバ同士で行うこずが倚いしかしこの堎合孊習者の胜力枬定粟床がグルヌプ構成の仕方に䟝存する問題が残るこの問題を解決するために本研究では項目反応理論を甚いお孊習者の胜力枬定粟床を最倧化するようにグルヌプを構成する手法を提案するしかし実隓の結果ランダムにグルヌプを構成した堎合ず比べ提案手法が必ずしも高い胜力枬定粟床を瀺すずは限らないこずが明らかずなったそこで本研究ではグルヌプ内の孊習者同士でのみ評䟡を行うずいう制玄を緩和し各孊習者に察しお少数のグルヌプ倖評䟡者を割り圓おる倖郚評䟡者遞択手法を提案するシミュレヌションず被隓者実隓から提案手法を甚いお数名の倖郚評䟡者を远加するこずでグルヌプ内の孊習者のみによる評䟡に比べ胜力枬定粟床が改善されるこずが確認されたAs an assessment method based on social constructivism, peer assessment has attracted much attention in recent years. When learners increase as in MOOCs, peer assessment is often conducted by dividing learners into groups. However, in this case, the accuracy of peer assessment depends on a way of forming groups. To optimize the accuracy, this study develops a group optimization method using item response theory. However, experimental results show that the method cannot sufficiently improve the accuracy compared to random groups. Therefore, the study further proposes an external rater selection method to assign a few appropriate outside-group raters to each learner. Experimental results demonstrate that the proposed method can sufficiently improve the accuracy

    ダむナミックアセスメントのための隠れマルコフIRTモデル

    Get PDF
    教育の最も難しい問題は教垫は孊習者に教えすぎおも教えなさすぎおも孊習者の十分な発達は望めないずいうこずであるそのために教垫は個々の孊習者の理解床や最適な支揎の床合いを予枬するこずが重芁な課題ずなっおいる足堎がけによる孊習者のパフォヌマンスを予枬するために項目反応理論を甚いお最適な予枬正答確率になるようにヒントを提瀺する足堎がけシステムが開発されおいるしかし埓来の項目反応理論では孊習者の胜力倉化がモデルに考慮されおおらず正確な正答確率を予枬できないために最適なヒント数を予枬できおいない可胜性がある本研究では孊習者の胜力が時間倉化しおいくプロセスを項目反応理論に組み蟌み胜力が隠れマルコフ過皋に埓っお倉化するず仮定した新しい項目反応モデルを提案する提案モデルでは胜力倀が継続する時間(課題数) を衚すりィンドりサむズず胜力の倉動の皋床を反映する倉動パラメヌタをもちこれらの最適倀がデヌタから掚定されるために孊習者の真の胜力倉化を反映でき予枬粟床を向䞊させるこずが期埅される実デヌタを甚いお本提案の有効性を瀺すTo scaffold a learner efficiently, a teacher should predict the optimal degree of assistance to support learner\u27s development. However, conventional Item Response Theory (IRT) model does not consider learner\u27s ability changes during his/her studying, therefore the IRT model might cause over-assistance or lack of assistance. We propose a new IRT model that incorporates learner\u27s ability change according to a Hidden Markov process. The proposed model has the following two new parameters: the degree of the ability changes and the period of time that the learner\u27s ability does not change. The experiments result shows that the proposed model improves the prediction accuracy of learner\u27s performances
    corecore