42 research outputs found
Item Response Theory for Peer Assessment
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve the reliability if the model parameters can be estimated accurately. However, when applying them to actual peer assessment, the parameter estimation accuracy would be reduced for the following reasons. 1) The number of rater parameters increases with two or more times the number of raters because the models include higher-dimensional rater parameters. 2) The accuracy of parameter estimation from sparse peer assessment data depends strongly on hand-tuning parameters, called hyperparameters. To solve these problems, this article presents a proposal of a new item response model for peer assessment that incorporates rater parameters to maintain as few rater parameters as possible. Furthermore, this article presents a proposal of a parameter estimation method using a hierarchical Bayes model for the proposed model that can learn the hyperparameters from data. Finally, this article describes the effectiveness of the proposed method using results obtained from a simulation and actual data experiments
è©äŸ¡è ç¹æ§ãã©ã¡ãŒã¿ãä»äžããé ç®åå¿ã¢ãã«ã«åºã¥ãããã©ãŒãã³ã¹ã»ãã¹ãã®çå粟床
è¿å¹ŽïŒåéšè
ã®å®è·µçãã€é«æ¬¡ã®èœåã枬å®ããææ³ã®äžã€ãšããŠããã©ãŒãã³ã¹è©äŸ¡ã泚ç®ãããŠããïŒäžæ¹ã§ïŒããã©ãŒãã³ã¹è©äŸ¡ã®åé¡ãšããŠïŒèœå枬å®ã®ç²ŸåºŠãè©äŸ¡è
ãšããã©ãŒãã³ã¹èª²é¡ã®ç¹æ§ã«åŒ·ãäŸåããç¹ãææãããŠããïŒãã®åé¡ã解決ããææ³ãšããŠïŒè¿å¹ŽïŒè©äŸ¡è
ãšèª²é¡ã®ç¹æ§ãè¡šããã©ã¡ãŒã¿ãä»äžããé
ç®åå¿ã¢ãã«ãå€æ°ææ¡ããïŒãã®æå¹æ§ã瀺ãããŠããïŒä»æ¹ïŒçŸå®ã®è©äŸ¡å Žé¢ã§ã¯ïŒè€æ°åã®ç°ãªãããã©ãŒãã³ã¹ãã¹ãã®çµæãæ¯èŒããããŒãºããã°ãã°çããïŒãã®ãããªå Žåã«é
ç®åå¿ã¢ãã«ãé©çšããããã«ã¯ïŒåã
ã®ãã¹ãçµæããæšå®ãããã¢ãã«ãã©ã¡ãŒã¿ãåäžå°ºåºŠäžã«äœçœ®ä»ãããçåããå¿
èŠãšãªãïŒäžè¬ã«ïŒããã©ãŒãã³ã¹ãã¹ãã®çåãè¡ãããã«ã¯ïŒãã¹ãéã§èª²é¡ãšè©äŸ¡è
ã®äžéšãå
±éããããã«åã
ã®ãã¹ããèšèšããå¿
èŠãããïŒãã®ãšãïŒçåã®ç²ŸåºŠã¯ïŒå
±é課é¡ãå
±éè©äŸ¡è
ã®æ°ïŒåãã¹ãã«ãããåéšè
ã®èœåç¹æ§ååžïŒåéšè
æ°ã»è©äŸ¡è
æ°ã»èª²é¡æ°ãªã©ã®æ§ã
ãªæ¡ä»¶ã«äŸåãããšèããããïŒãããïŒãããŸã§ïŒãããã®èŠå ãçå粟床ã«äžãã圱é¿ã¯æããã«ãããŠãããïŒãã¹ããã©ã®ããã«èšèšããã°é«ç²ŸåºŠãªçåãå¯èœãšãªããã¯ç€ºãããŠããªãã£ãïŒããã§æ¬ç 究ã§ã¯ïŒé
ç®åå¿ã¢ãã«ãããã©ãŒãã³ã¹è©äŸ¡ã«é©çšããŠçåãè¡ãå Žåã«ïŒãã®ç²ŸåºŠã«åœ±é¿ãäžããèŠå ãå®éšã«ããæããã«ãïŒãã®çµæã«åºã¥ãïŒé«ãçå粟床ãéæããããã«å¿
èŠãªãã¹ãã®ãã¶ã€ã³ã«ã€ããŠåºæºã瀺ãïŒIn various assessment contexts, performance assessment has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this problem, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. On the other hand, scores obtained from a performance test is often compared to those obtained from different tests practically. For that purpose, test equating, which is the statistical process of determining comparable scores on different forms, is required. To conduct the test equating, each test must be formed to have common raters and performance tasks. In this case, accuracy of the equating depends on various settings including the number of common raters and tasks, the ability distribution assumed in each tests, the number of examinees, rater and tasks. However, no relevant studies have examined what factors affect the equating accuracy. For that reason, the study evaluates the accuracy of performance test equating based on the IRT models while changing the test design. From the result, we show the factors affecting the equating accuracy and give some designs providing high equating accuracy
Empirical comparison of item response theory models with rater\u27s parameters
In various assessment contexts including entrance examinations, educational assessments, and personnel appraisal, performance assessment by raters has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this shortcoming, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. However, because various models with different rater and task parameters exist, it is difficult to understand each model\u27s features. Therefore, this study presents empirical comparisons of IRT models. Specifically, after reviewing and summarizing features of existing models, we compare their performance through simulation and actual data experiments
æ å ±è«çã¢ãããŒãã«åºã¥ãè«ææ§ææ§ç¯æ¯æŽã·ã¹ãã
é»æ°é信倧åŠ201
ãã¢ã¢ã»ã¹ã¡ã³ãã«ãããç°è³ªè©äŸ¡è ã«é å¥ãªé ç®åå¿çè«
è¿å¹ŽïŒMOOCsã«ä»£è¡šããã倧èŠæš¡eã©ãŒãã³ã°ã®æ®åã«äŒŽãïŒãã¢ã¢ã»ã¹ã¡ã³ããåŠç¿è
ã®èœå枬å®ã«çšããããŒãºãé«ãŸã£ãŠããïŒäžæ¹ã§ïŒãã¢ã¢ã»ã¹ã¡ã³ãã«ããèœå枬å®ã®èª²é¡ãšããŠïŒãã®æž¬å®ç²ŸåºŠãè©äŸ¡è
ã®ç¹æ§ã«åŒ·ãäŸåããåé¡ãææãããŠããïŒãã®åé¡ã解決ããææ³ã®äžã€ãšããŠïŒè©äŸ¡è
ç¹æ§ãã©ã¡ãŒã¿ãä»äžããé
ç®åå¿ã¢ãã«ãè¿å¹Žå€æ°ææ¡ãããŠããïŒãããïŒæ¢åã¢ãã«ã§ã¯ïŒè©äŸ¡åºæºãä»ã®è©äŸ¡è
ãšæ¥µç«¯ã«ç°ãªãâç°è³ªè©äŸ¡è
âã®ç¹æ§ãå¿
ãããè¡šçŸã§ããªãããïŒç°è³ªè©äŸ¡è
ãååšããå¯èœæ§ããããã¢ã¢ã»ã¹ã¡ã³ãã«é©çšãããšãèœå枬å®ç²ŸåºŠãäœäžããåé¡ãæ®ãïŒãã®åé¡ã解決ããããã«ïŒæ¬è«æã§ã¯ïŒ1ïŒè©äŸ¡ã®å³ããïŒ2ïŒäžè²«æ§ïŒ3ïŒå°ºåºŠç¯å²ã®å¶éïŒã«å¯Ÿå¿ããè©äŸ¡è
ç¹æ§ãã©ã¡ãŒã¿ãä»äžããæ°ããªé
ç®åå¿ã¢ãã«ãææ¡ããïŒææ¡ã¢ãã«ã®å©ç¹ã¯æ¬¡ã®ãšããã§ããïŒ1ïŒè©äŸ¡è
ã®ç¹æ§ãæè»ã«è¡šçŸã§ããããïŒç°è³ªè©äŸ¡è
ã®æ¡ç¹ããŒã¿ã«å¯Ÿããã¢ãã«ã®ããŠã¯ãŸããæ¹åã§ããïŒ2ïŒç°è³ªè©äŸ¡è
ã®åœ±é¿ãæ£ç¢ºã«èœå枬å®å€ã«åæ ã§ããããïŒç°è³ªè©äŸ¡è
ãååšãããã¢ã¢ã»ã¹ã¡ã³ãã«ãããŠïŒæ¢åã¢ãã«ããé«ç²ŸåºŠãªèœå枬å®ãæåŸ
ã§ããïŒæ¬è«æã§ã¯ïŒã·ãã¥ã¬ãŒã·ã§ã³å®éšãšå®ããŒã¿å®éšããææ¡ã¢ãã«ã®æå¹æ§ã瀺ãïŒItem response theory (IRT) model that incorporates rater characteristic parameters have recently been proposed to improve peer assessment accuracy. However, the assessment accuracy based on the models will be reduced when the number of aberrant raters increases because they can necessarily not capture those characteristics. To resolve the problem, we propose a new IRT model that incorporates rater characteristic parameters corresponding to severity, consistency, and range restriction. The proposed model has the following advantages. 1) The model fitting to aberrant raters\u27 data is expected to be improved because the proposed model can represent rater characteristics flexibly. 2) Peer assessment accuracy is expected to be improved even when aberrant raters exist because learner ability can be estimated as to reflect aberrant raters\u27 characteristics more accurately. Through simulation and actual data experiments, we demonstrate effectiveness of the proposed model
Group optimization to maximize peer assessment accuracy using item response theory and integer programming
With the wide spread of large-scale e-learning environments such as MOOCs, peer assessment has been popularly used to measure learner ability. When the number of learners increases, peer assessment is often conducted by dividing learners into multiple groups to reduce the learner\u27s assessment workload. However, in such cases, the peer assessment accuracy depends on the method of forming groups. To resolve that difficulty, this study proposes a group formation method to maximize peer assessment accuracy using item response theory and integer programming. Experimental results, however, have demonstrated that the proposed method does not present sufficiently higher accuracy than a random group formation method does. Therefore, this study further proposes an external rater assignment method that assigns a few outside-group raters to each learner after groups are formed using the proposed group formation method. Through results of simulation and actual data experiments, this study demonstrates that the proposed external rater assignment can substantially improve peer assessment accuracy
A generalized many-facet Rasch model and its Bayesian estimation using Hamiltonian Monte Carlo
Performance assessments, in which raters assess examinee performance for given tasks, have a persistent difficulty in that ability measurement accuracy depends on rater characteristics. To address this problem, various item response theory (IRT) models that incorporate rater characteristic parameters have been proposed. Conventional models partially consider three typical rater characteristics: severity, consistency, and range restriction. Each are important to improve model fitting and ability measurement accuracy, especially when the diversity of raters increases. However, no models capable of simultaneously representing each have been proposed. One obstacle for developing such a complex model is the difficulty of parameter estimation. Maximum likelihood estimation, which is used in most conventional models, generally leads to unstable and inaccurate parameter estimations in complex models. Bayesian estimation is expected to provide more robust estimations. Although it incurs high computational costs, recent increases in computational capabilities and the development of efficient Markov chain Monte Carlo (MCMC) algorithms make its use feasible. We thus propose a new IRT model that can represent all three typical rater characteristics. The model is formulated as a generalization of the many-facet Rasch model. We also develop a Bayesian estimation method for the proposed model using No-U-Turn Hamiltonian Monte Carlo, a state-of-the-art MCMC algorithm. We demonstrate the effectiveness of the proposed method through simulation and actual data experiments
Bayes factorãçšããRAIã¢ã«ãŽãªãºã ã«ãã倧èŠæš¡ãã€ãžã¢ã³ãããã¯ãŒã¯åŠç¿
挞è¿äžèŽæ§ããã€ãã€ãžã¢ã³ãããã¯ãŒã¯ã®æ§é åŠç¿ã¯NPå°é£ã§ããïŒãããŸã§åçèšç»æ³ãA*æ¢çŽ¢ïŒæŽæ°èšç»æ³ã«ããæ¢çŽ¢ã¢ã«ãŽãªãºã ãéçºãããŠãããïŒæªã ã«60ããŒãçšåºŠã®æ§é åŠç¿ãéçãšãïŒå€§èŠæš¡æ§é åŠç¿ã®å®çŸã®ããã«ã¯ïŒå
šãç°ãªãã¢ãããŒãã®éçºãæ¥åã§ããïŒäžæ¹ã§å æã¢ãã«ã®ç 究åéã§ã¯ïŒæ¡ä»¶ä»ãç¬ç«æ§ãã¹ãïŒCIãã¹ãïŒãšæ¹åä»ãã«ããç»æçã«èšç®éãåæžããæ§é åŠç¿ã¢ãããŒããææ¡ãããŠããïŒãã®ã¢ãããŒãã¯å¶çŽããŒã¹ã¢ãããŒããšåŒã°ãïŒRAIã¢ã«ãŽãªãºã ãæãé«ç²ŸåºŠãªæå
端åŠç¿æ³ãšããŠç¥ãããŠããïŒãããRAIã¢ã«ãŽãªãºã ã¯ïŒCIãã¹ãã«ä»®èª¬æ€å®æ³ãŸãã¯æ¡ä»¶ä»ãçžäºæ
å ±éãçšããŠããïŒåè
ã®ç²ŸåºŠã¯åž°ç¡ä»®èª¬ãæ£ãã確çãè¡šãpå€ãšãŠãŒã¶ãèšå®ããæææ°Žæºã«äŸåããïŒpå€ã¯ããŒã¿æ°ã®å¢å ã«ããå°ããå€ãåãïŒèª€ã£ãŠåž°ç¡ä»®èª¬ãæ£åŽããŠããŸãåé¡ãç¥ãããŠããïŒäžæ¹ã§ïŒåŸè
ã®ç²ŸåºŠã¯ãããå€ã®èšå®ã«åŒ·ã圱é¿ããïŒãããã£ãŠïŒæŒžè¿çã«çã®æ§é ãåŠç¿ã§ããä¿èšŒããªãïŒæ¬è«æã§ã¯ïŒæŒžè¿äžèŽæ§ãæããBayes factorãçšããCIãã¹ããRAIã¢ã«ãŽãªãºã ã«çµã¿èŸŒãïŒããã«ããïŒæ°çŸããŒãããã€å€§èŠæš¡æ§é åŠç¿ãå®çŸããïŒæ°çš®é¡ã®ãã³ãããŒã¯ãããã¯ãŒã¯ãçšããã·ãã¥ã¬ãŒã·ã§ã³å®éšã«ããïŒæ¬ææ³ã®æææ§ã瀺ãïŒA score-based learning Bayesian networks is NP-hard. On the other hands, constraint-based approach, that can dynamically relaxes the computational cost, is applicable to learning huge Bayesian network structures. The approach uses conditional independence (CI) tests based on the conditional mutual information and statistical testings. However, those CI tests have no consistency. In this paper, we propose a new constraint-based learning method that uses the CI test based on the Bayes factor, which have consistency. The proposed method combines it to the RAI algorithm, that is a state-of-the-art algorithm of the constraint-based approach. The experimental result shows our proposed method provides empirically best performance
ãã¢ã¢ã»ã¹ã¡ã³ãã«ãããé ç®åå¿çè«ãçšããã°ã«ãŒãæ§ææé©å
è¿å¹ŽïŒç€ŸäŒæ§æ䞻矩ã«åºã¥ãåŠç¿è©äŸ¡æ³ãšããŠãã¢ã¢ã»ã¹ã¡ã³ãã泚ç®ãããŠããïŒäžè¬ã«ïŒMOOCsã®ããã«åŠç¿è
æ°ãå€ãå Žåã®ãã¢ã¢ã»ã¹ã¡ã³ãã¯ïŒè©äŸ¡ã®è² æ
ã軜æžããããã«åŠç¿è
ãè€æ°ã®ã°ã«ãŒãã«åå²ããŠã°ã«ãŒãå
ã®ã¡ã³ãå士ã§è¡ãããšãå€ãïŒãããïŒãã®å ŽåïŒåŠç¿è
ã®èœå枬å®ç²ŸåºŠãã°ã«ãŒãæ§æã®ä»æ¹ã«äŸåããåé¡ãæ®ãïŒãã®åé¡ã解決ããããã«ïŒæ¬ç 究ã§ã¯ïŒé
ç®åå¿çè«ãçšããŠïŒåŠç¿è
ã®èœå枬å®ç²ŸåºŠãæ倧åããããã«ã°ã«ãŒããæ§æããææ³ãææ¡ããïŒãããïŒå®éšã®çµæïŒã©ã³ãã ã«ã°ã«ãŒããæ§æããå Žåãšæ¯ã¹ïŒææ¡ææ³ãå¿
ãããé«ãèœå枬å®ç²ŸåºŠã瀺ããšã¯éããªãããšãæãããšãªã£ãïŒããã§ïŒæ¬ç 究ã§ã¯ïŒã°ã«ãŒãå
ã®åŠç¿è
å士ã§ã®ã¿è©äŸ¡ãè¡ããšããå¶çŽãç·©åãïŒååŠç¿è
ã«å¯ŸããŠå°æ°ã®ã°ã«ãŒãå€è©äŸ¡è
ãå²ãåœãŠãå€éšè©äŸ¡è
éžæææ³ãææ¡ããïŒã·ãã¥ã¬ãŒã·ã§ã³ãšè¢«éšè
å®éšããïŒææ¡ææ³ãçšããŠæ°åã®å€éšè©äŸ¡è
ãè¿œå ããããšã§ïŒã°ã«ãŒãå
ã®åŠç¿è
ã®ã¿ã«ããè©äŸ¡ã«æ¯ã¹ïŒèœå枬å®ç²ŸåºŠãæ¹åãããããšã確èªãããïŒAs an assessment method based on social constructivism, peer assessment has attracted much attention in recent years. When learners increase as in MOOCs, peer assessment is often conducted by dividing learners into groups. However, in this case, the accuracy of peer assessment depends on a way of forming groups. To optimize the accuracy, this study develops a group optimization method using item response theory. However, experimental results show that the method cannot sufficiently improve the accuracy compared to random groups. Therefore, the study further proposes an external rater selection method to assign a few appropriate outside-group raters to each learner. Experimental results demonstrate that the proposed method can sufficiently improve the accuracy
ãã€ãããã¯ã¢ã»ã¹ã¡ã³ãã®ããã®é ããã«ã³ãIRTã¢ãã«
æè²ã®æãé£ããåé¡ã¯ïŒæåž«ã¯åŠç¿è
ã«æããããŠãïŒæããªããããŠãåŠç¿è
ã®ååãªçºéã¯æããªããšããããšã§ããïŒãã®ããã«ïŒæåž«ã¯åã
ã®åŠç¿è
ã®ç解床ãæé©ãªæ¯æŽã®åºŠåããäºæž¬ããããšãéèŠãªèª²é¡ãšãªã£ãŠããïŒè¶³å Žããã«ããåŠç¿è
ã®ããã©ãŒãã³ã¹ãäºæž¬ããããã«ïŒé
ç®åå¿çè«ãçšããŠæé©ãªäºæž¬æ£ç確çã«ãªãããã«ãã³ããæ瀺ããè¶³å Žããã·ã¹ãã ãéçºãããŠããïŒãããïŒåŸæ¥ã®é
ç®åå¿çè«ã§ã¯ïŒåŠç¿è
ã®èœåå€åãã¢ãã«ã«èæ
®ãããŠãããïŒæ£ç¢ºãªæ£ç確çãäºæž¬ã§ããªãããã«ïŒæé©ãªãã³ãæ°ãäºæž¬ã§ããŠããªãå¯èœæ§ãããïŒæ¬ç 究ã§ã¯ïŒåŠç¿è
ã®èœåãæéå€åããŠããããã»ã¹ãé
ç®åå¿çè«ã«çµã¿èŸŒã¿ïŒèœåãé ããã«ã³ãéçšã«åŸã£ãŠå€åãããšä»®å®ããæ°ããé
ç®åå¿ã¢ãã«ãææ¡ããïŒææ¡ã¢ãã«ã§ã¯ïŒèœåå€ãç¶ç¶ããæé(課é¡æ°) ãè¡šããŠã£ã³ããŠãµã€ãºãšèœåã®å€åã®çšåºŠãåæ ããå€åãã©ã¡ãŒã¿ããã¡ïŒãããã®æé©å€ãããŒã¿ããæšå®ãããããã«ïŒåŠç¿è
ã®çã®èœåå€åãåæ ã§ãïŒäºæž¬ç²ŸåºŠãåäžãããããšãæåŸ
ãããïŒå®ããŒã¿ãçšããŠïŒæ¬ææ¡ã®æå¹æ§ã瀺ãïŒTo scaffold a learner efficiently, a teacher should predict the optimal degree of assistance to support learner\u27s development. However, conventional Item Response Theory (IRT) model does not consider learner\u27s ability changes during his/her studying, therefore the IRT model might cause over-assistance or lack of assistance. We propose a new IRT model that incorporates learner\u27s ability change according to a Hidden Markov process. The proposed model has the following two new parameters: the degree of the ability changes and the period of time that the learner\u27s ability does not change. The experiments result shows that the proposed model improves the prediction accuracy of learner\u27s performances