1,446 research outputs found

    AGReE: A system for generating Automated Grammar Reading Exercises

    Full text link
    We describe the AGReE system, which takes user-submitted passages as input and automatically generates grammar practice exercises that can be completed while reading. Multiple-choice practice items are generated for a variety of different grammar constructs: punctuation, articles, conjunctions, pronouns, prepositions, verbs, and nouns. We also conducted a large-scale human evaluation with around 4,500 multiple-choice practice items. We notice for 95% of items, a majority of raters out of five were able to identify the correct answer and for 85% of cases, raters agree that there is only one correct answer among the choices. Finally, the error analysis shows that raters made the most mistakes for punctuation and conjunctions.Comment: Accepted to EMNLP 2022 Demonstration Trac

    DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach

    Full text link
    Multiple choice questions (MCQs) are an efficient and common way to assess reading comprehension (RC). Every MCQ needs a set of distractor answers that are incorrect, but plausible enough to test student knowledge. Distractor generation (DG) models have been proposed, and their performance is typically evaluated using machine translation (MT) metrics. However, MT metrics often misjudge the suitability of generated distractors. We propose DISTO: the first learned evaluation metric for generated distractors. We validate DISTO by showing its scores correlate highly with human ratings of distractor quality. At the same time, DISTO ranks the performance of state-of-the-art DG models very differently from MT-based metrics, showing that MT metrics should not be used for distractor evaluation

    Investigating the Pedagogical Content Knowledge of Georgia 6-12 Science Teachers in Relation to Conservation of Mass

    Get PDF
    The Law of Conservation of Matter is a crosscutting concept in science that has implications for all disciplines of science. Conservation of Matter concepts are interwoven into all middle school and high school science courses both within the Next Generation Science Standards (NGSS) and the Georgia Standards of Excellence (GSE). For students to become scientifically literate, teachers of science must be able to articulate the content accurately to students and anticipate student difficulties and misconceptions in understanding the content. In order to ensure that students successfully learn said content, science teachers must possess both content knowledge and pedagogical content knowledge. Strengths and limitations in the CK and PCK of science instructors within various populations must be identified so that interventions can be designed to help these teachers improve and enhance the PCK of the scientific community as a whole. This study utilized a mixed method design to investigate the correlation between content knowledge, pedagogical content knowledge, and instructor demographics, as well as discover the way that teachers address student misconceptions in class. Middle school and high school science teachers in Georgia participated in the administration of a concept inventory and semi-structured interviews relating to the concept of Conservation of Matter. The concept inventory data investigated indicated that there is no correlation between content knowledge and pedagogical content in the area of Conservation of Matter for these teachers. However, it was found that the content knowledge and teaching an honors level class influenced the pedagogical content knowledge score of these teachers. Interview data suggests that teacher misconceptions in regard to Conservation of Matter exist within this population. These misconceptions specifically were found in regard to the splitting of atoms during chemical reactions and matter cycling in biological systems. Teachers were both proactive and reactive to the presence of student misconceptions in class. Another finding from this study indicates that teachers make alterations to their curriculum due to misconceptions. While the modifications to the curriculum varied from adding/changing activities, adding additional instructional time, and incorporating more discussions and questioning, a high percentage of teachers interviewed did modify their curriculum due to misconceptions being present. This study highlights the CK and PCK of teachers related to conservation of matter and can be utilized in order to develop interventions and professional development for teachers that allow for development in these areas

    Cloze Test and C-test Revisited: Appraising Collocational Competence on Second Language Learners’ Performance

    Get PDF
    The current study tried to investigate the particular role of the text in EFL learners’ performance on three types of tests, i.e. cloze test, C-test and open-ended test. This study aimed at comparing three test types of cloze test, C-test and open-ended test in measuring collocational knowledge of Iranian EFL learners. This was a quantitative research. This type of research placed more emphases on collecting data in the form of numbers. To this end, 84 Persian EFL learners were selected. They were both male and female with intermediate and advanced proficiency groups. The results showed that advanced participants in all of these three tests performed much more efficiently compared to their intermediate peers and indicated more collocational competence. The findings of this study had some implications for language learners, EFL instructors and material developers

    The Impact of Changing the Size of Aircraft Radar Displays on Visual Search in the Cockpit

    Get PDF
    Advances in sensor technology have enabled our fighter aircraft to find, fix, track, target, engage (F2T2E) at greater distances, providing the operator with more data within the battlefield. Modern aircraft are designed with larger displays while our legacy aircraft are being retrofitted with larger cockpit displays to enable display of the increased data. While this modification has been shown to enable improvements in human performance of many cockpit tasks, this effect is often not measured nor fully understood at a more generalizable level. This research outlines an approach to comparing human performance across two display sizes in future F-16 cockpits. The results show that increases in display size can increase search times under some circumstances even when the displays include a large number of tracks, actually reducing human performance

    EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

    Full text link
    We introduce EgoSchema, a very long-form video question-answering dataset, and benchmark to evaluate long video understanding capabilities of modern vision and language systems. Derived from Ego4D, EgoSchema consists of over 5000 human curated multiple choice question answer pairs, spanning over 250 hours of real video data, covering a very broad range of natural human activity and behavior. For each question, EgoSchema requires the correct answer to be selected between five given options based on a three-minute-long video clip. While some prior works have proposed video datasets with long clip lengths, we posit that merely the length of the video clip does not truly capture the temporal difficulty of the video task that is being considered. To remedy this, we introduce temporal certificate sets, a general notion for capturing the intrinsic temporal understanding length associated with a broad range of video understanding tasks & datasets. Based on this metric, we find EgoSchema to have intrinsic temporal lengths over 5.7x longer than the second closest dataset and 10x to 100x longer than any other video understanding dataset. Further, our evaluation of several current state-of-the-art video and language models shows them to be severely lacking in long-term video understanding capabilities. Even models with several billions of parameters achieve QA accuracy less than 33% (random is 20%) on the EgoSchema multi-choice question answering task, while humans achieve about 76% accuracy. We posit that \name{}{}, with its long intrinsic temporal structures and diverse complexity, would serve as a valuable evaluation probe for developing effective long-term video understanding systems in the future. Data and Zero-shot model evaluation code are open-sourced for both public and commercial use under the Ego4D license at http://egoschema.github.ioComment: https://egoschema.github.io
    corecore