1,377 research outputs found

    The effects of self-examination to peers on student learning in physical science

    Get PDF
    This study was undertaken to test if the use of self-explanation to a peer would affect learning outcomes in the classroom. The outcomes of classes taught using the self-explanation technique were compared to outcomes from traditional lecture courses in lessons of comparable content. Great Scholars and traditional students in a sixth grade physical science classroom setting were given pre-and post-tests in two units of study, matter and waves. In the matter unit, students participated in a lesson on density using traditional lecture and a lesson on changes in matter using self-explanation. In the waves unit, students utilized lecture instruction for a lesson on electromagnetic waves and self-explanation instruction for a lesson on sound waves. Pre-test scores, post-test scores, and learning gains were analyzed for each lesson across instructional treatments and class types. After the unit on waves students were given an opinion survey to determine which instructional method they preferred using. Self-explanation had a significantly positive impact on learning gains for the Great Scholars students in the first unit of study. No detectible differences in gains for the second unit of study were found in either group of students. However, the opinion survey given after the second unit of study suggests that students experience greater enjoyment when using the self-explanation instructional technique. Larger sample sizes and experiments in other science disciplines may lead to a better understanding of how self-explanation to a peer impacts student learning

    Using Games to Teach English in Chinese High School Classroom

    Get PDF
    In the 20th century, English played an important role in international communication as an international language. English is a bridge between countries\u27 economies, cultures, and trade. However, current English education in Chinese high schools is still test-oriented which is ineffective, and students are tired of it. Moreover, teachers also have trouble engaging students in the class. The purpose of this project is to create a curriculum for high school English teachers in China to use games to teach English language skills. Krashen’s (1982) Theory of Second Language Acquisition contains five main hypotheses which support this project. The project includes twenty-three activities to improve students’ five language skills: vocabulary, listening, reading, speaking, and writing

    "Teach AI How to Code": Using Large Language Models as Teachable Agents for Programming Education

    Full text link
    This work investigates large language models (LLMs) as teachable agents for learning by teaching (LBT). LBT with teachable agents helps learners identify their knowledge gaps and discover new knowledge. However, teachable agents require expensive programming of subject-specific knowledge. While LLMs as teachable agents can reduce the cost, LLMs' over-competence as tutees discourages learners from teaching. We propose a prompting pipeline that restrains LLMs' competence and makes them initiate "why" and "how" questions for effective knowledge-building. We combined these techniques into TeachYou, an LBT environment for algorithm learning, and AlgoBo, an LLM-based tutee chatbot that can simulate misconceptions and unawareness prescribed in its knowledge state. Our technical evaluation confirmed that our prompting pipeline can effectively configure AlgoBo's problem-solving performance. Through a between-subject study with 40 algorithm novices, we also observed that AlgoBo's questions led to knowledge-dense conversations (effect size=0.73). Lastly, we discuss design implications, cost-efficiency, and personalization of LLM-based teachable agents

    Know Thy Strengths: Comprehensive Dialogue State Tracking Diagnostics

    Full text link
    Recent works that revealed the vulnerability of dialogue state tracking (DST) models to distributional shifts have made holistic comparisons on robustness and qualitative analyses increasingly important for understanding their relative performance. We present our findings from standardized and comprehensive DST diagnoses, which have previously been sparse and uncoordinated, using our toolkit, CheckDST, a collection of robustness tests and failure mode analytics. We discover that different classes of DST models have clear strengths and weaknesses, where generation models are more promising for handling language variety while span-based classification models are more robust to unseen entities. Prompted by this discovery, we also compare checkpoints from the same model and find that the standard practice of selecting checkpoints using validation loss/accuracy is prone to overfitting and each model class has distinct patterns of failure. Lastly, we demonstrate how our diagnoses motivate a pre-finetuning procedure with non-dialogue data that offers comprehensive improvements to generation models by alleviating the impact of distributional shifts through transfer learning.Comment: EMNLP202

    Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

    Full text link
    Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy protection in the European Union and Switzerland. With the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland, we explore the potential of LLMs to re-identify individuals in court rulings by constructing a proof-of-concept using actual legal data from the Swiss federal supreme court. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. With the introduction and application of the new task of re-identifying people in texts, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. The complexity is attributed to the lack of test datasets, the necessity for substantial training resources, and data sparsity in the information used for re-identification. In conclusion, this study demonstrates that re-identification using LLMs may not be feasible for now, but as the proof-of-concept on Wikipedia showed, it might become possible in the future. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading to the courts being more confident to publish decisions

    Utilizing NWEA Map Data To Create Scaffolded And Differentiated Instruction That Advances Student Mastery Of Literary Standards And Deepens Student Understanding Of Literary Texts

    Get PDF
    The research question addressed in this project was, how can NWEA MAP data be utilized to create scaffolded and differentiated instruction that advances student mastery of literary standards and deepens student understanding of literary text? This question was addressed by creating a literary unit that aligns to the NWEA MAP learning continuum. The unit integrates differentiation strategies and scaffolding techniques to help students of all levels successfully master literary standards and deepen their understanding of literary texts. The author documents the related research literature used to construct the unit and describes the details of the unit. Additionally, the author describes her successes implementing the unit in her own 8th grade classroom. Major conclusions of the project are: 1) differentiation strategies can be used to increase student engagement; 2) utilizing testing data in the classroom, while time consuming, is a valuable practice to help students of all levels master evidence ­based analysis

    GenAIPABench: A Benchmark for Generative AI-based Privacy Assistants

    Full text link
    Privacy policies inform users about the data management practices of organizations. Yet, their complexity often renders them largely incomprehensible to the average user, necessitating the development of privacy assistants. With the advent of generative AI (genAI) technologies, there is an untapped potential to enhance privacy assistants in answering user queries effectively. However, the reliability of genAI remains a concern due to its propensity for generating incorrect or misleading information. This study introduces GenAIPABench, a novel benchmarking framework designed to evaluate the performance of Generative AI-based Privacy Assistants (GenAIPAs). GenAIPABench comprises: 1) A comprehensive set of questions about an organization's privacy policy and a data protection regulation, along with annotated answers for several organizations and regulations; 2) A robust set of evaluation metrics for assessing the accuracy, relevance, and consistency of the generated responses; and 3) An evaluation tool that generates appropriate prompts to introduce the system to the privacy document and different variations of the privacy questions to evaluate its robustness. We use GenAIPABench to assess the potential of three leading genAI systems in becoming GenAIPAs: ChatGPT, Bard, and Bing AI. Our results demonstrate significant promise in genAI capabilities in the privacy domain while also highlighting challenges in managing complex queries, ensuring consistency, and verifying source accuracy
    corecore