11 research outputs found

    Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education

    Full text link
    Developing models to automatically score students' written responses to science problems is critical for science education. However, collecting and labeling sufficient student responses for training models is time and cost-consuming. Recent studies suggest that pre-trained language models (PLMs) can be adapted to downstream tasks without fine-tuning with prompts. However, no research has employed such a prompt approach in science education. As student responses are presented with natural language, aligning the scoring procedure as the next sentence prediction task using prompts can skip the costly fine-tuning stage. In this study, we developed a zero-shot approach to automatically score student responses via Matching Exemplars as Next Sentence Prediction (MeNSP). This approach employs no training samples. We first apply MeNSP in scoring three assessment tasks of scientific argumentation and found machine-human scoring agreements, Cohen's Kappa ranges from 0.30 to 0.57, and F1 score ranges from 0.54 to 0.81. To improve the performance, we extend our research to the few-shots setting, either randomly selecting labeled student responses or manually constructing responses to fine-tune the models. We find that one task's performance is improved with more samples, Cohen's Kappa from 0.30 to 0.38, and F1 score from 0.54 to 0.59; for the two others, scoring performance is not improved. We also find that randomly selected few-shots perform better than the human expert-crafted approach. This study suggests that MeNSP can yield referable automatic scoring for student responses while significantly reducing the cost of model training. This method can benefit low-stakes classroom assessment practices in science education. Future research should further explore the applicability of the MeNSP in different types of assessment tasks in science education and improve the model performance.Comment: 10+3 page

    Black-box Backdoor Defense via Zero-shot Image Purification

    Full text link
    Backdoor attacks inject poisoned samples into the training data, resulting in the misclassification of the poisoned input during a model's deployment. Defending against such attacks is challenging, especially for real-world black-box models where only query access is permitted. In this paper, we propose a novel defense framework against backdoor attacks through Zero-shot Image Purification (ZIP). Our framework can be applied to poisoned models without requiring internal information about the model or any prior knowledge of the clean/poisoned samples. Our defense framework involves two steps. First, we apply a linear transformation (e.g., blurring) on the poisoned image to destroy the backdoor pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process by using the transformed image to guide the generation of high-fidelity purified images, which works in zero-shot settings. We evaluate our ZIP framework on multiple datasets with different types of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models. Our code is available at https://github.com/sycny/ZIP.Comment: Accepted by NeurIPS 202

    Towards Personalized Cold-Start Recommendation with Prompts

    Full text link
    Recommender systems play a crucial role in helping users discover information that aligns with their interests based on their past behaviors. However, developing personalized recommendation systems becomes challenging when historical records of user-item interactions are unavailable, leading to what is known as the system cold-start recommendation problem. This issue is particularly prominent in start-up businesses or platforms with insufficient user engagement history. Previous studies focus on user or item cold-start scenarios, where systems could make recommendations for new users or items but are still trained with historical user-item interactions in the same domain, which cannot solve our problem. To bridge the gap, our research introduces an innovative and effective approach, capitalizing on the capabilities of pre-trained language models. We transform the recommendation process into sentiment analysis of natural languages containing information of user profiles and item attributes, where the sentiment polarity is predicted with prompt learning. By harnessing the extensive knowledge housed within language models, the prediction can be made without historical user-item interaction records. A benchmark is also introduced to evaluate the proposed method under the cold-start setting, and the results demonstrate the effectiveness of our method. To the best of our knowledge, this is the first study to tackle the system cold-start recommendation problem. The benchmark and implementation of the method are available at https://github.com/JacksonWuxs/PromptRec.Comment: 8 pages, 2 figure

    Applying large language models and chain-of-thought for automatic scoring

    No full text
    This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment tasks (three binomial and three trinomial) with 1,650 student responses, we employed six prompt engineering strategies to automatically score student responses. The six strategies combined zero-shot or few-shot learning with CoT, either alone or alongside item stem and scoring rubrics, developed based on a novel approach, WRVRT (prompt writing, reviewing, validating, revising, and testing). Results indicated that few-shot (acc = 0.67) outperformed zero-shot learning (acc = 0.60), with 12.6% increase. CoT, when used without item stem and scoring rubrics, did not significantly affect scoring accuracy (acc = 0.60). However, CoT prompting paired with contextual item stems and rubrics proved to be a significant contributor to scoring accuracy (13.44% increase for zero-shot; 3.7% increase for few-shot). We found a more balanced accuracy across different proficiency categories when CoT was used with a scoring rubric, highlighting the importance of domain-specific reasoning in enhancing the effectiveness of LLMs in scoring tasks. We also found that GPT-4 demonstrated superior performance over GPT-3.5 in various scoring tasks when combined with the single-call greedy sampling or ensemble voting nucleus sampling strategy, showing 8.64% difference. Particularly, the single-call greedy sampling strategy with GPT-4 outperformed other approaches. This study also demonstrates the potential of LLMs in facilitating explainable and interpretable automatic scoring, emphasizing that CoT enhances accuracy and transparency, particularly when used with item stem and scoring rubrics

    Artificial General Intelligence (AGI) for Education

    Full text link
    Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. AGI aims to replicate human intelligence through computer systems, which is one of the critical technologies having the potential to revolutionize the field of education. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This work reviews AGI's key concepts, capabilities, scope, and potential within future education, including setting educational goals, designing pedagogy and curriculum, and performing assessments. We also provide rich discussions over various ethical issues in education faced by AGI and how AGI will affect human educators. The development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.Comment: Review Paper on AGI for Educatio

    Crocin Improves the Endothelial Function Regulated by Kca3.1 Through ERK and Akt Signaling Pathways

    No full text
    Background/Aims: Based on the protective effect of crocin against cardiovascular diseases, we hypothesize that crocin could improve endothelial function through activating the eNOS(endothelial nitric oxide synthase) /NO pathway and/or the intermediate-conductance Ca2+-activated K+ channels (KCa3.1). Methods: In this study, rat aortic rings were used to assess the regulatory effect of crocin on vascular tone and nitric oxide, prostacyclin, and KCa3.1, all endothelial vasodilators, were analyzed for effects by crocin. The expression profiles of p-eNOS, total-eNOS, p-ERK, total-ERK, p-Akt, total-Akt, KCa3.1, CD31, thrombomodulin, ICAM-1 and VCAM-1 were tested by western blotting. KCa3.1 was also analyzed by qPCR and immunofluorescence staining. Fluorescence and confocal microscopy were used to determine NO generation and intracellular Ca2+. Both EdU and MTT assays were used to evaluate cell viability. Cellular migration was assessed using transwell assay. Results: Crocin relaxed pre-contracted artery rings through either NO or KCa3.1, but not PGI, in an endothelium-dependent manner. Furthermore, crocin increased p-eNOS, total-eNOS expression and NO production as well as intracellular Ca2+ in both HUVECs and HUAECs (Human Umbilical Artery Endothelial cells). Crocin also stimulated the expression of CD31, thrombomodulin and vascular cell adhesion molecule 1 (VCAM-1), as well as increased cellular proliferation and migration in vitro. Interestingly, we determined for the first time that by blocking or silencing KCa3.1 there was inhibition of crocin induced upregulation of p-eNOS and total-eNOS. Correspondingly, the KCa3.1 inhibitor TRAM-34 also reduced the expression of CD31, thrombomodulin and VCAM-1, as well as diminished intracellular Ca2+, cellular proliferation and migration. Finally, crocin stimulated the expression of p-ERK, total-ERK, p-Akt and total-Akt, however suppression of MEK and Akt inhibited this expression profile in endothelial cells. Conclusion: In the present study, these data strongly support the hypothesis that crocin could improve endothelial function through stimulation of the eNOS/NO pathway and other endothelial markers. This functional improvement is regulated by KCa3.1 via the MEK/ERK and PI3K/Akt signaling pathway

    Delayed PCI is not beneficial for STEMI patients with impaired renal function: a retrospective cohort study

    No full text
    Abstract Background Preexisting impaired renal function (IRF) and contrast-induced nephropathy (CIN) after percutaneous coronary intervention (PCI) in patients with ST-segment elevation myocardial infarction (STEMI) are important prognostic parameters, but it is unknown whether delayed PCI is still beneficial for STEMI patients with IRF. Methods A retrospective single-center cohort study was performed in 164 patients who presented at least 12 h after symptom onset, and were diagnosed with STEMI and IRF. They were assigned to two groups to receive PCI plus optimal medical therapy (OMT) and OMT alone respectively. Clinical outcomes at 30 days and 1 year were compared between two groups, and hazard ratio for survival was analyzed using Cox regression model. A power analysis demanded 34 patients in each group to produce a power of 90% and a P value of 0.05. Results The 30-day mortality was significantly lower in PCI group (n = 126) than in non-PCI group (n = 38) (11.1% versus 28.9%, P = 0.018), while there was no significant difference in the 1-year mortality and incidence of cardiovascular comorbidities between the two groups. Cox regression analysis showed that patients with IRF didn’t benefit from receiving PCI on survival rate (P = 0.267). Conclusions Delayed PCI is not beneficial on one-year clinical outcomes for STEMI patients with IRF
    corecore