Search CORE

16,341 research outputs found

The promise and potential of e-assessment for learning

Author: Kalz Marco
Ras Eric
Whitelock Denise
Publication venue: 'Informa UK Limited'
Publication date: 29/11/2015
Field of study

Open University of the Netherlands Research Portal

Assessment in anatomy

Author: Brenner Erich
Chirculescu Andy R M
Reblet Concepción
Smith Claire
Publication venue: Spanish Society of Anatomy
Publication date: 01/01/2015
Field of study

From an educational perspective, a very important problem is that of assessment, for establishing competency and as selection criterion for different professional purposes. Among the issues to be addressed are the methods of assessment and/or the type of tests, the range of scores, or the definition of honour degrees. The methods of assessment comprise such different forms such as the spotter examination, short or long essay questions, short answer questions, true-false questions, single best answer questions, multiple choice questions, extended match questions, or several forms of oral approaches such as viva voce examinations.Knowledge about this is important when assessing different educational objectives; assessing educational objectives from the cognitive domain will need different assessment instruments than assessing educational objectives from the psychomotor domain or even the affective domain.There is no golden rule, which type of assessment instrument or format will be the best in measuring certain educational objectives; but one has to respect that there is no assessment instrument, which is capable to assess educational objectives from all domains of educational objectives.Whereas the first two or three levels of progress can be assessed by well-structured written examinations such as multiple choice questions, or multiple answer questions, other and higher level progresses need other instruments, such as a thesis, or direct observation.This is no issue at all in assessment tools, where the students are required to select the appropriate answer from a given set of choices, as in true false questions, MCQ, EMQ, etc. The standard setting is done in these cases by the selection of the true answer

Sussex Research Online

Automated Reading Passage Generation with OpenAI's Large Language Model

Author: Bezirhan Ummugul
von Davier Matthias
Publication venue
Publication date: 10/04/2023
Field of study

The widespread usage of computer-based assessments and individualized learning platforms has resulted in an increased demand for the rapid production of high-quality items. Automated item generation (AIG), the process of using item models to generate new items with the help of computer technology, was proposed to reduce reliance on human subject experts at each step of the process. AIG has been used in test development for some time. Still, the use of machine learning algorithms has introduced the potential to improve the efficiency and effectiveness of the process greatly. The approach presented in this paper utilizes OpenAI's latest transformer-based language model, GPT-3, to generate reading passages. Existing reading passages were used in carefully engineered prompts to ensure the AI-generated text has similar content and structure to a fourth-grade reading passage. For each prompt, we generated multiple passages, the final passage was selected according to the Lexile score agreement with the original passage. In the final round, the selected passage went through a simple revision by a human editor to ensure the text was free of any grammatical and factual errors. All AI-generated passages, along with original passages were evaluated by human judges according to their coherence, appropriateness to fourth graders, and readability

arXiv.org e-Print Archive

Component skills of inferential processing in older readers

Author: Bulatov Dasha
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (M.S.)--Boston UniversityThe ability to make inferences has been shown to be a crucial component of successful reading in older students. The current project investigates differences in comprehension of text-based (factual) and inferential information across grade levels and modalities, and seeks to determine which component language and reading skills that are important in making inferences. 1,836 students in grades 6-12 were tested on a computerized battery of language subtests in the auditory and written modalities. Eleven subtests examining performance on lower levels of were administered in addition to a measure of factual and inferential discourse comprehension. Results demonstrated that students performed better overall in the written modality. Students in older grades were consistently faster and more accurate. Vocabulary knowledge had the biggest effect for performance on inferential questions in the written modality in middle school, while sentence-level skills were most important in high school. In the auditory modality, sentence-level skills were most predictive across question types and grade levels. Implications for theories of inferential processing and for teaching inferences within literacy education frameworks will be discussed

Boston University Institutional Repository (OpenBU)

Recommended from our members

Building capacity in climate change policy analysis and negotiation: methods and technologies

Author: Aczel James
Hardy Pascale
Peake Stephen
Publication venue: 'Centro de Estudos e Pesquisas em Educacao, Cultura e Acao Comunitaria (CENPEC)'
Publication date: 21/06/2005
Field of study

Capacity building is often cited as the reason “we cannot just pour money into developing countries” and why so many development projects fail because their design does not address local conditions. It is therefore a key technical and political concept in international development. Some of the poorest countries in the world are also some of the most vulnerable to the impacts of climate change. Their vulnerability is in part due to a lack of capacity to plan and anticipate the effects of climate change on crops, water resources, urban electricity demand etc. What capacities do these countries lack to deal with climate change? How will they cope? What steps can they take to reduce their vulnerability? This innovative and high-profile research project was part of a larger project (called C3D) and conducted with non-governmental organisations in Senegal, South Africa and Sri Lanka. The research involved several participatory workshops and a questionnaire to all three research centres

Open Research Online (The Open University)

Recommended from our members

Proceedings of QG2010: The Third Workshop on Question Generation

Author: Boyer Kristy Elizabeth
Piwek Paul
Publication venue: questiongeneration.org
Publication date: 18/06/2010
Field of study

These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge". QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)

Open Research Online (The Open University)

Evaluating Large Language Models: A Comprehensive Survey

Author: Guo Zishan
Huang Yufei
Jin Renren
Li Jiaxuan
Liu Chuang
Liu Yan
Shi Dan
Supryadi
Xiong Bojian
Xiong Deyi
Yu Linhao
Publication venue
Publication date: 25/11/2023
Field of study

Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rapid progress of LLMs raises concerns about the potential emergence of superintelligent systems without adequate safeguards. To effectively capitalize on LLM capacities as well as ensure their safe and beneficial development, it is critical to conduct a rigorous and comprehensive evaluation of LLMs. This survey endeavors to offer a panoramic perspective on the evaluation of LLMs. We categorize the evaluation of LLMs into three major groups: knowledge and capability evaluation, alignment evaluation and safety evaluation. In addition to the comprehensive review on the evaluation methodologies and benchmarks on these three aspects, we collate a compendium of evaluations pertaining to LLMs' performance in specialized domains, and discuss the construction of comprehensive evaluation platforms that cover LLM evaluations on capabilities, alignment, safety, and applicability. We hope that this comprehensive overview will stimulate further research interests in the evaluation of LLMs, with the ultimate goal of making evaluation serve as a cornerstone in guiding the responsible development of LLMs. We envision that this will channel their evolution into a direction that maximizes societal benefit while minimizing potential risks. A curated list of related papers has been publicly available at https://github.com/tjunlp-lab/Awesome-LLMs-Evaluation-Papers.Comment: 111 page

arXiv.org e-Print Archive

A Survey on Evaluation of Large Language Models

Author: Chang Yi
Chang Yupeng
Chen Hao
Wang Cunxiang
Wang Jindong
Wang Xu
Wang Yidong
Wu Yuan
Xie Xing
Yang Linyi
Yang Qiang
Ye Wei
Yi Xiaoyuan
Yu Philip S.
Zhang Yue
Zhu Kaijie
Publication venue
Publication date: 06/07/2023
Field of study

Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate, where to evaluate, and how to evaluate. Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, educations, natural and social sciences, agent applications, and other areas. Secondly, we answer the `where' and `how' questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey.Comment: 23 page

arXiv.org e-Print Archive

Automatic Question Generation for the Portuguese Language

Author: Bernardo José Coelho Leite
Publication venue
Publication date: 20/07/2020
Field of study

Repositório Aberto da Universidade do Porto