11,715 research outputs found

    Group optimization to maximize peer assessment accuracy using item response theory and integer programming

    Get PDF
    With the wide spread of large-scale e-learning environments such as MOOCs, peer assessment has been popularly used to measure learner ability. When the number of learners increases, peer assessment is often conducted by dividing learners into multiple groups to reduce the learner\u27s assessment workload. However, in such cases, the peer assessment accuracy depends on the method of forming groups. To resolve that difficulty, this study proposes a group formation method to maximize peer assessment accuracy using item response theory and integer programming. Experimental results, however, have demonstrated that the proposed method does not present sufficiently higher accuracy than a random group formation method does. Therefore, this study further proposes an external rater assignment method that assigns a few outside-group raters to each learner after groups are formed using the proposed group formation method. Through results of simulation and actual data experiments, this study demonstrates that the proposed external rater assignment can substantially improve peer assessment accuracy

    ピアアセスメントのための項目反応理論と整数計画法を用いたグループ構成最適化

    Get PDF
    In recent years, large-scale e-learning environments such as Massive Online Open Courses (MOOCs) have become increasingly popular. In such environments, peer assessment, which is mutual assessment among learners, has been used to evaluate reports and programming assignments. When the number of learners increases as in MOOCs, peer assessment is often conducted by dividing learners into multiple groups to reduce the learners’ assessment workload. In this case, however, the accuracy of peer assessment depends on the way to form groups. To solve the problem, this study proposes a group optimization method based on item response theory (IRT) and integer programming. The proposed group optimization method is formulated as an integer programming problem that maximizes the Fisher information, which is a widely used index of ability assessment accuracy in IRT. Experimental results, however, show that the proposed method cannot sufficiently improve the accuracy compared to the random group formulation. To overcome this limitation, this study introduces the concept of external raters and proposes an external rater selection method that assigns a few appropriate external raters to each learner after the groups were formed using the proposed group optimization method. In this study, an external rater is defined as a peer-rater who belongs to different groups. The proposed external rater selection method is formulated as an integer programming problem that maximizes the lower bound of the Fisher information of the estimated ability of the learners by the external raters. Experimental results using both simulated and real-world peer assessment data show that the introduction of external raters is useful to improve the accuracy sufficiently. The result also demonstrates that the proposed external rater selection method based on IRT models can significantly improve the accuracy of ability assessment than the random selection.近年,MOOCsなどの大規模型eラーニングが普及してきた.大規模な数の学習者が参加している場合には,教師が一人で学習者のレポートやプログラム課題などを評価することは難しい.大規模の学習者の評価手法の一つとして,学習者同士によるピアアセスメントが注目されている.MOOCsのように学習者数が多い場合のピアアセスメントは,評価の負担を軽減するために学習者を複数のグループに分割してグループ内のメンバ同士で行うことが多い.しかし,この場合,グループ構成の仕方によって評価結果が大きく変化してしまう問題がある.この問題を解決するために,本研究では,項目反応理論と整数計画法を用いて,グループで行うピアアセスメントの精度を最適化するグループ構成手法を提案する.具体的には,項目反応理論において学習者の能力測定精度を表すフィッシャー情報量を最大化する整数計画問題としてグループ構成問題を定式化する.実験の結果,ランダムグループ構成と比べて,提案手法はおおむね測定精度を改善したが,それは限定的な結果であることが明らかとなった.そこで,本研究ではさらに,異なるグループから数名の学習者を外部評価者として各学習者に割り当て外部評価者選択手法を提案する.シミュレーションと実データ実験により,提案手法を用いることで能力測定精度を大幅に改善できることを示す.電気通信大学201

    Psychometrics in Practice at RCEC

    Get PDF
    A broad range of topics is dealt with in this volume: from combining the psychometric generalizability and item response theories to the ideas for an integrated formative use of data-driven decision making, assessment for learning and diagnostic testing. A number of chapters pay attention to computerized (adaptive) and classification testing. Other chapters treat the quality of testing in a general sense, but for topics like maintaining standards or the testing of writing ability, the quality of testing is dealt with more specifically.\ud All authors are connected to RCEC as researchers. They present one of their current research topics and provide some insight into the focus of RCEC. The selection of the topics and the editing intends that the book should be of special interest to educational researchers, psychometricians and practitioners in educational assessment

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Computational purification of individual tumor gene expression profiles leads to significant improvements in prognostic prediction.

    Get PDF
    Tumor heterogeneity is a limiting factor in cancer treatment and in the discovery of biomarkers to personalize it. We describe a computational purification tool, ISOpure, to directly address the effects of variable normal tissue contamination in clinical tumor specimens. ISOpure uses a set of tumor expression profiles and a panel of healthy tissue expression profiles to generate a purified cancer profile for each tumor sample and an estimate of the proportion of RNA originating from cancerous cells. Applying ISOpure before identifying gene signatures leads to significant improvements in the prediction of prognosis and other clinical variables in lung and prostate cancer

    ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing

    Full text link
    Given the rapid ascent of large language models (LLMs), we study the question: (How) can large language models help in reviewing of scientific papers or proposals? We first conduct some pilot studies where we find that (i) GPT-4 outperforms other LLMs (Bard, Vicuna, Koala, Alpaca, LLaMa, Dolly, OpenAssistant, StableLM), and (ii) prompting with a specific question (e.g., to identify errors) outperforms prompting to simply write a review. With these insights, we study the use of LLMs (specifically, GPT-4) for three tasks: 1. Identifying errors: We construct 13 short computer science papers each with a deliberately inserted error, and ask the LLM to check for the correctness of these papers. We observe that the LLM finds errors in 7 of them, spanning both mathematical and conceptual errors. 2. Verifying checklists: We task the LLM to verify 16 closed-ended checklist questions in the respective sections of 15 NeurIPS 2022 papers. We find that across 119 {checklist question, paper} pairs, the LLM had an 86.6% accuracy. 3. Choosing the "better" paper: We generate 10 pairs of abstracts, deliberately designing each pair in such a way that one abstract was clearly superior than the other. The LLM, however, struggled to discern these relatively straightforward distinctions accurately, committing errors in its evaluations for 6 out of the 10 pairs. Based on these experiments, we think that LLMs have a promising use as reviewing assistants for specific reviewing tasks, but not (yet) for complete evaluations of papers or proposals

    Empirical comparison of item response theory models with rater\u27s parameters

    Get PDF
    In various assessment contexts including entrance examinations, educational assessments, and personnel appraisal, performance assessment by raters has attracted much attention to measure higher order abilities of examinees. However, a persistent difficulty is that the ability measurement accuracy depends strongly on rater and task characteristics. To resolve this shortcoming, various item response theory (IRT) models that incorporate rater and task characteristic parameters have been proposed. However, because various models with different rater and task parameters exist, it is difficult to understand each model\u27s features. Therefore, this study presents empirical comparisons of IRT models. Specifically, after reviewing and summarizing features of existing models, we compare their performance through simulation and actual data experiments
    corecore