13 research outputs found
Matching Exemplar as Next Sentence Prediction (MeNSP): Zero-shot Prompt Learning for Automatic Scoring in Science Education
Developing models to automatically score students' written responses to
science problems is critical for science education. However, collecting and
labeling sufficient student responses for training models is time and
cost-consuming. Recent studies suggest that pre-trained language models (PLMs)
can be adapted to downstream tasks without fine-tuning with prompts. However,
no research has employed such a prompt approach in science education. As
student responses are presented with natural language, aligning the scoring
procedure as the next sentence prediction task using prompts can skip the
costly fine-tuning stage. In this study, we developed a zero-shot approach to
automatically score student responses via Matching Exemplars as Next Sentence
Prediction (MeNSP). This approach employs no training samples. We first apply
MeNSP in scoring three assessment tasks of scientific argumentation and found
machine-human scoring agreements, Cohen's Kappa ranges from 0.30 to 0.57, and
F1 score ranges from 0.54 to 0.81. To improve the performance, we extend our
research to the few-shots setting, either randomly selecting labeled student
responses or manually constructing responses to fine-tune the models. We find
that one task's performance is improved with more samples, Cohen's Kappa from
0.30 to 0.38, and F1 score from 0.54 to 0.59; for the two others, scoring
performance is not improved. We also find that randomly selected few-shots
perform better than the human expert-crafted approach. This study suggests that
MeNSP can yield referable automatic scoring for student responses while
significantly reducing the cost of model training. This method can benefit
low-stakes classroom assessment practices in science education. Future research
should further explore the applicability of the MeNSP in different types of
assessment tasks in science education and improve the model performance.Comment: 10+3 page
Black-box Backdoor Defense via Zero-shot Image Purification
Backdoor attacks inject poisoned samples into the training data, resulting in
the misclassification of the poisoned input during a model's deployment.
Defending against such attacks is challenging, especially for real-world
black-box models where only query access is permitted. In this paper, we
propose a novel defense framework against backdoor attacks through Zero-shot
Image Purification (ZIP). Our framework can be applied to poisoned models
without requiring internal information about the model or any prior knowledge
of the clean/poisoned samples. Our defense framework involves two steps. First,
we apply a linear transformation (e.g., blurring) on the poisoned image to
destroy the backdoor pattern. Then, we use a pre-trained diffusion model to
recover the missing semantic information removed by the transformation. In
particular, we design a new reverse process by using the transformed image to
guide the generation of high-fidelity purified images, which works in zero-shot
settings. We evaluate our ZIP framework on multiple datasets with different
types of attacks. Experimental results demonstrate the superiority of our ZIP
framework compared to state-of-the-art backdoor defense baselines. We believe
that our results will provide valuable insights for future defense methods for
black-box models. Our code is available at https://github.com/sycny/ZIP.Comment: Accepted by NeurIPS 202
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era
Explainable AI (XAI) refers to techniques that provide human-understandable
insights into the workings of AI models. Recently, the focus of XAI is being
extended towards Large Language Models (LLMs) which are often criticized for
their lack of transparency. This extension calls for a significant
transformation in XAI methodologies because of two reasons. First, many
existing XAI methods cannot be directly applied to LLMs due to their complexity
advanced capabilities. Second, as LLMs are increasingly deployed across diverse
industry applications, the role of XAI shifts from merely opening the "black
box" to actively enhancing the productivity and applicability of LLMs in
real-world settings. Meanwhile, unlike traditional machine learning models that
are passive recipients of XAI insights, the distinct abilities of LLMs can
reciprocally enhance XAI. Therefore, in this paper, we introduce Usable XAI in
the context of LLMs by analyzing (1) how XAI can benefit LLMs and AI systems,
and (2) how LLMs can contribute to the advancement of XAI. We introduce 10
strategies, introducing the key techniques for each and discussing their
associated challenges. We also provide case studies to demonstrate how to
obtain and leverage explanations. The code used in this paper can be found at:
https://github.com/JacksonWuxs/UsableXAI_LLM.Comment: 38 pages, 4 figure
Towards Personalized Cold-Start Recommendation with Prompts
Recommender systems play a crucial role in helping users discover information
that aligns with their interests based on their past behaviors. However,
developing personalized recommendation systems becomes challenging when
historical records of user-item interactions are unavailable, leading to what
is known as the system cold-start recommendation problem. This issue is
particularly prominent in start-up businesses or platforms with insufficient
user engagement history. Previous studies focus on user or item cold-start
scenarios, where systems could make recommendations for new users or items but
are still trained with historical user-item interactions in the same domain,
which cannot solve our problem. To bridge the gap, our research introduces an
innovative and effective approach, capitalizing on the capabilities of
pre-trained language models. We transform the recommendation process into
sentiment analysis of natural languages containing information of user profiles
and item attributes, where the sentiment polarity is predicted with prompt
learning. By harnessing the extensive knowledge housed within language models,
the prediction can be made without historical user-item interaction records. A
benchmark is also introduced to evaluate the proposed method under the
cold-start setting, and the results demonstrate the effectiveness of our
method. To the best of our knowledge, this is the first study to tackle the
system cold-start recommendation problem. The benchmark and implementation of
the method are available at https://github.com/JacksonWuxs/PromptRec.Comment: 8 pages, 2 figure
Applying large language models and chain-of-thought for automatic scoring
This study investigates the application of large language models (LLMs), specifically GPT-3.5 and GPT-4, with Chain-of-Though (CoT) in the automatic scoring of student-written responses to science assessments. We focused on overcoming the challenges of accessibility, technical complexity, and lack of explainability that have previously limited the use of artificial intelligence-based automatic scoring tools among researchers and educators. With a testing dataset comprising six assessment tasks (three binomial and three trinomial) with 1,650 student responses, we employed six prompt engineering strategies to automatically score student responses. The six strategies combined zero-shot or few-shot learning with CoT, either alone or alongside item stem and scoring rubrics, developed based on a novel approach, WRVRT (prompt writing, reviewing, validating, revising, and testing). Results indicated that few-shot (acc = 0.67) outperformed zero-shot learning (acc = 0.60), with 12.6% increase. CoT, when used without item stem and scoring rubrics, did not significantly affect scoring accuracy (acc = 0.60). However, CoT prompting paired with contextual item stems and rubrics proved to be a significant contributor to scoring accuracy (13.44% increase for zero-shot; 3.7% increase for few-shot). We found a more balanced accuracy across different proficiency categories when CoT was used with a scoring rubric, highlighting the importance of domain-specific reasoning in enhancing the effectiveness of LLMs in scoring tasks. We also found that GPT-4 demonstrated superior performance over GPT-3.5 in various scoring tasks when combined with the single-call greedy sampling or ensemble voting nucleus sampling strategy, showing 8.64% difference. Particularly, the single-call greedy sampling strategy with GPT-4 outperformed other approaches. This study also demonstrates the potential of LLMs in facilitating explainable and interpretable automatic scoring, emphasizing that CoT enhances accuracy and transparency, particularly when used with item stem and scoring rubrics
Artificial General Intelligence (AGI) for Education
Artificial general intelligence (AGI) has gained global recognition as a
future technology due to the emergence of breakthrough large language models
and chatbots such as GPT-4 and ChatGPT, respectively. AGI aims to replicate
human intelligence through computer systems, which is one of the critical
technologies having the potential to revolutionize the field of education.
Compared to conventional AI models, typically designed for a limited range of
tasks, demand significant amounts of domain-specific data for training and may
not always consider intricate interpersonal dynamics in education. AGI, driven
by the recent large pre-trained models, represents a significant leap in the
capability of machines to perform tasks that require human-level intelligence,
such as reasoning, problem-solving, decision-making, and even understanding
human emotions and social interactions. This work reviews AGI's key concepts,
capabilities, scope, and potential within future education, including setting
educational goals, designing pedagogy and curriculum, and performing
assessments. We also provide rich discussions over various ethical issues in
education faced by AGI and how AGI will affect human educators. The development
of AGI necessitates interdisciplinary collaborations between educators and AI
engineers to advance research and application efforts.Comment: Review Paper on AGI for Educatio
Crocin Improves the Endothelial Function Regulated by Kca3.1 Through ERK and Akt Signaling Pathways
Background/Aims: Based on the protective effect of crocin against cardiovascular diseases, we hypothesize that crocin could improve endothelial function through activating the eNOS(endothelial nitric oxide synthase) /NO pathway and/or the intermediate-conductance Ca2+-activated K+ channels (KCa3.1). Methods: In this study, rat aortic rings were used to assess the regulatory effect of crocin on vascular tone and nitric oxide, prostacyclin, and KCa3.1, all endothelial vasodilators, were analyzed for effects by crocin. The expression profiles of p-eNOS, total-eNOS, p-ERK, total-ERK, p-Akt, total-Akt, KCa3.1, CD31, thrombomodulin, ICAM-1 and VCAM-1 were tested by western blotting. KCa3.1 was also analyzed by qPCR and immunofluorescence staining. Fluorescence and confocal microscopy were used to determine NO generation and intracellular Ca2+. Both EdU and MTT assays were used to evaluate cell viability. Cellular migration was assessed using transwell assay. Results: Crocin relaxed pre-contracted artery rings through either NO or KCa3.1, but not PGI, in an endothelium-dependent manner. Furthermore, crocin increased p-eNOS, total-eNOS expression and NO production as well as intracellular Ca2+ in both HUVECs and HUAECs (Human Umbilical Artery Endothelial cells). Crocin also stimulated the expression of CD31, thrombomodulin and vascular cell adhesion molecule 1 (VCAM-1), as well as increased cellular proliferation and migration in vitro. Interestingly, we determined for the first time that by blocking or silencing KCa3.1 there was inhibition of crocin induced upregulation of p-eNOS and total-eNOS. Correspondingly, the KCa3.1 inhibitor TRAM-34 also reduced the expression of CD31, thrombomodulin and VCAM-1, as well as diminished intracellular Ca2+, cellular proliferation and migration. Finally, crocin stimulated the expression of p-ERK, total-ERK, p-Akt and total-Akt, however suppression of MEK and Akt inhibited this expression profile in endothelial cells. Conclusion: In the present study, these data strongly support the hypothesis that crocin could improve endothelial function through stimulation of the eNOS/NO pathway and other endothelial markers. This functional improvement is regulated by KCa3.1 via the MEK/ERK and PI3K/Akt signaling pathway
Delayed PCI is not beneficial for STEMI patients with impaired renal function: a retrospective cohort study
Abstract Background Preexisting impaired renal function (IRF) and contrast-induced nephropathy (CIN) after percutaneous coronary intervention (PCI) in patients with ST-segment elevation myocardial infarction (STEMI) are important prognostic parameters, but it is unknown whether delayed PCI is still beneficial for STEMI patients with IRF. Methods A retrospective single-center cohort study was performed in 164 patients who presented at least 12 h after symptom onset, and were diagnosed with STEMI and IRF. They were assigned to two groups to receive PCI plus optimal medical therapy (OMT) and OMT alone respectively. Clinical outcomes at 30 days and 1 year were compared between two groups, and hazard ratio for survival was analyzed using Cox regression model. A power analysis demanded 34 patients in each group to produce a power of 90% and a P value of 0.05. Results The 30-day mortality was significantly lower in PCI group (n = 126) than in non-PCI group (n = 38) (11.1% versus 28.9%, P = 0.018), while there was no significant difference in the 1-year mortality and incidence of cardiovascular comorbidities between the two groups. Cox regression analysis showed that patients with IRF didn’t benefit from receiving PCI on survival rate (P = 0.267). Conclusions Delayed PCI is not beneficial on one-year clinical outcomes for STEMI patients with IRF
Influence of Cu/Mg ratio and content on heat-resistance of Al–Cu–Mg alloys
Heat-resistant Al alloys used in such as aerospace, transportation fields are attracting more and more attention in recent years. Within Al alloy families, Al–Cu–Mg alloys have shown promising heat resistance properties. This work aims to investigate the influence of Cu/Mg ratio and content on the heat resistance of Al–Cu–Mg alloys, based on alloys of Al–4.5Cu–2.5 Mg (referred to as alloy A), Al–4.0Cu–2.2 Mg (alloy B) and Al–4.5Cu–1.6 Mg (alloy C). The alloys A and B possessed approximate Cu/Mg ratio, and they also exhibited nearly identical hardness retention rate during exposure at 200 °C. After 200 h, the rate is ∼75 %, though alloy A showed higher hardness (105 vs. 102 HBW) due to higher Cu, Mg content. In contrast, alloy C with a higher Cu/Mg ratio was less heat-resistant, with hardness retention rate of ∼70.5 % after 200 h exposure. Nano-sized S′(Al2CuMg) precipitate was main strengthening phase for the three alloys. Also, micron and submicron Al2CuMg particles could be formed with increase of Cu and Mg contents, which contributed a lot to yield strength for T6 heat-treated alloys, but slight contribution after exposure at 200 °C for 200 h. The degradation of mechanical properties during heat exposure can be attributed to the transformation and coarsening of S′ precipitates. In alloys with lower Cu/Mg ratio, there was excess Mg dissolved in Al matrix, which reduced Cu solubility in α-Al, and then slowed diffusion flux of Cu element, thus inhibited coarsening of Al2CuMg phase