279 research outputs found
CodeCoT and Beyond: Learning to Program and Test like a Developer
In natural language processing, transformer-based large language models
(LLMs) like GPT-x models developed by OpenAI have revolutionized the landscape.
Despite their impressive capabilities, these models often encounter challenges
when handling tasks that differ from their training data, resulting in
compromised performance. To address this, few-shot learning has emerged as a
valuable technique, allowing LLMs to adapt with minimal task-specific data. One
innovative strategy, known as Chain-of-Thought Prompting (CoT), has been
introduced to guide LLMs in revealing cognitive processes during multi-step
reasoning. In this paper, we propose Code Chain-of-Thought~(CodeCoT), which
consists of two components: the Vanilla CodeCoT and the Self-exam CodeCoT. The
latter incorporates self-examination, empowering the model to iteratively
generate code, formulate test cases, and refine its outputs. Specifically, the
process entails the generation of test examples by the model corresponding to
the code it is tasked to implement. If it fails on the test examples, then it
regenerates the code based on the erroneous code and associated error types.
Through comprehensive experiments, we observed that both techniques
significantly enhance code generation accuracy across various LLM variants. Our
evaluation results reveal that CodeCoT improves the code generation
effectiveness, including an unprecedented pass@1 accuracy of 79.27\% using the
Self-exam CodeCoT approach on the gpt-3.5-turbo-0613 model in the HumanEval
dataset
FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks
Deep neural networks have been widely used in many critical applications,
such as autonomous vehicles and medical diagnosis. However, their security is
threatened by backdoor attack, which is achieved by adding artificial patterns
to specific training data. Existing defense strategies primarily focus on using
reverse engineering to reproduce the backdoor trigger generated by attackers
and subsequently repair the DNN model by adding the trigger into inputs and
fine-tuning the model with ground-truth labels. However, once the trigger
generated by the attackers is complex and invisible, the defender can not
successfully reproduce the trigger. Consequently, the DNN model will not be
repaired since the trigger is not effectively removed.
In this work, we propose Feature Map Testing~(FMT). Different from existing
defense strategies, which focus on reproducing backdoor triggers, FMT tries to
detect the backdoor feature maps, which are trained to extract backdoor
information from the inputs. After detecting these backdoor feature maps, FMT
will erase them and then fine-tune the model with a secure subset of training
data. Our experiments demonstrate that, compared to existing defense
strategies, FMT can effectively reduce the Attack Success Rate (ASR) even
against the most complex and invisible attack triggers. Second, unlike
conventional defense methods that tend to exhibit low Robust Accuracy (i.e.,
the model's accuracy on the poisoned data), FMT achieves higher RA, indicating
its superiority in maintaining model performance while mitigating the effects
of backdoor attacks~(e.g., FMT obtains 87.40\% RA in CIFAR10). Third, compared
to existing feature map pruning techniques, FMT can cover more backdoor feature
maps~(e.g., FMT removes 83.33\% of backdoor feature maps from the model in the
CIFAR10 \& BadNet scenario).Comment: 12 pages, 4 figure
Feature Map Testing for Deep Neural Networks
Due to the widespread application of deep neural networks~(DNNs) in
safety-critical tasks, deep learning testing has drawn increasing attention.
During the testing process, test cases that have been fuzzed or selected using
test metrics are fed into the model to find fault-inducing test units (e.g.,
neurons and feature maps, activating which will almost certainly result in a
model error) and report them to the DNN developer, who subsequently repair
them~(e.g., retraining the model with test cases). Current test metrics,
however, are primarily concerned with the neurons, which means that test cases
that are discovered either by guided fuzzing or selection with these metrics
focus on detecting fault-inducing neurons while failing to detect
fault-inducing feature maps.
In this work, we propose DeepFeature, which tests DNNs from the feature map
level. When testing is conducted, DeepFeature will scrutinize every internal
feature map in the model and identify vulnerabilities that can be enhanced
through repairing to increase the model's overall performance. Exhaustive
experiments are conducted to demonstrate that (1) DeepFeature is a strong tool
for detecting the model's vulnerable feature maps; (2) DeepFeature's test case
selection has a high fault detection rate and can detect more types of
faults~(comparing DeepFeature to coverage-guided selection techniques, the
fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also
outperforms current fuzzing techniques and generates valuable test cases more
efficiently.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with
arXiv:2307.1101
Hybrid ceramics-based cancer theranostics
Cancer is a major threat to human lives. Early detection and precisely targeted therapy/therapies for cancer is the most effective way to reduce the difficulties (e.g., side effects, low survival rate, etc.) in treating cancer. To enable effective cancer detection and treatment, ceramic biomaterials have been intensively and extensively investigated owing to their good biocompatibility, high bioactivity, suitable biodegradability and other distinctive properties that are required for medical devices in oncology. Through hybridization with other materials and loading of imaging agents and therapeutic agents, nanobioceramics can form multifunctional nanodevices to simultaneously provide diagnostic and therapeutic functions for cancer patients, and these nanodevices are known as hybrid ceramics-based cancer theranostics. In this review, the recent developments of hybrid ceramics-based cancer theranostics, which include the key aspects such as their preparation, biological evaluation and applications, are summarized and discussed. The challenges and future perspectives for the clinical translation of hybrid ceramics-based cancer theranostics are also discussed. It is believed that the potential of hybrid ceramic nanoparticles as cancer theranostics is high and that the future of these theranostics is bright despite the difficulties along the way for their clinical translation
Bias Assessment and Mitigation in LLM-based Code Generation
Utilizing state-of-the-art Large Language Models (LLMs), automatic code
generation models play a pivotal role in enhancing the productivity and
efficiency of software development coding procedures. As the adoption of LLMs
becomes more widespread in software coding ecosystems, a pressing issue has
emerged: does the generated code contain social biases, such as those related
to age, gender, and race? This issue concerns the integrity, fairness, and
ethical foundation of software applications that depend on the code generated
by these models, yet is under-explored in the literature. This paper presents a
novel bias assessment framework that is specifically designed for code
generation tasks. Based on this framework, we conduct an extensive evaluation
on the bias of nine state-of-the-art LLM-based code generation models. Our
findings reveal that first, 31.45\% to 79.93\% code functions generated by our
evaluated code generation models are biased, and 9.68\% to 37.37\% code
functions' functionality are affected by the bias, which means biases not only
exist in code generation models but in some cases, directly affect the
functionality of the generated code, posing risks of unintended and possibly
harmful software behaviors. To mitigate bias from code generation models, we
propose three mitigation strategies, which can decrease the biased code ratio
to a very low level of 0.4\% to 4.57\%
Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing
Deep Neural Networks~(DNNs) have been widely deployed in software to address
various tasks~(e.g., autonomous driving, medical diagnosis). However, they
could also produce incorrect behaviors that result in financial losses and even
threaten human safety. To reveal the incorrect behaviors in DNN and repair
them, DNN developers often collect rich unlabeled datasets from the natural
world and label them to test the DNN models. However, properly labeling a large
number of unlabeled datasets is a highly expensive and time-consuming task.
To address the above-mentioned problem, we propose NSS, Neuron Sensitivity
guided test case Selection, which can reduce the labeling time by selecting
valuable test cases from unlabeled datasets. NSS leverages the internal
neuron's information induced by test cases to select valuable test cases, which
have high confidence in causing the model to behave incorrectly. We evaluate
NSS with four widely used datasets and four well-designed DNN models compared
to SOTA baseline methods. The results show that NSS performs well in assessing
the test cases' probability of fault triggering and model improvement
capabilities. Specifically, compared with baseline approaches, NSS obtains a
higher fault detection rate~(e.g., when selecting 5\% test case from the
unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault
detection rate, 20\% higher than baselines)
Transforming Research on Recreational Ecosystem Services into Applications and Governance
The science-practice gap has recently been discussed as a critical challenge restricting sustainable growth and development in all facets of our society, including explorations of Recreation Ecosystem Services (RES). To better explore how well the scientific study of RES and its application are connected, this paper aims to synthesize empirical evidence based on an in-depth and systematic literature review. We found that studies of RES have not effectively transformed into the decision-making and long-term planning of our cities. From 2005 to 2020, only 13% of studies referred to specific applications, and about 40% of papers mentioned no applications or practical implications for their research. However, RES research has many potential applications, which can be categorised into six main aspects. In terms of non-spatial improvement: Improved monetary benefits (40%), non-monetary benefits (30%); in terms of spatial improvement: space with high recreational potential or degradation (7%), the relation between supply and demand (7%); and Cross-service governance (16%). After combining the results of various studies, we developed a framework starting from applicable problems and their solutions, which can incorporate the outcomes of RES research while systematically narrowing down the research questions and methods. The framework offers a starting point for further research that can modify and improve in bridging science-practice gaps in RES studies.National Natural Science Foundation of ChinaPeer Reviewe
Greenhouse gas emissions in a subtropical jasmine plantation managed with straw combined with industrial and agricultural wastes
The effects of straw alone or combined with industrial and agricultural wastes as fertilizers on greenhouse gas (GHG) emissions are still poorly known in cropland areas. Here, we studied the effects of 3.5 Mg ha−1 straw and 3.5 Mg ha−1 straw combined with 8 Mg ha−1 of diverse wastes on GHG emission in a subtropical Jasminum sambac plantation in southeastern China. There were five treatments in a completely randomized block design: control, straw only, straw + biochar, straw + steel slag, and straw + gypsum slag. Emissions of carbon dioxide were generally higher in the treatments with waste than in the control or straw-only treatments, whereas the contrary pattern was observed in CH4 and N2O emission rates. Moreover, the total global warming potentials (GWPs) were no significantly higher in most of the amended treatments as compared to the control and straw-only treatments. In relation to the treatment with only straw, GWPs were 9.4% lower when steel slag was used. This finding could be a consequence of Fe amount added by steel slag, which would limit and inhibit the emissions of GHGs and their transport from soil to atmosphere. Our results showed that the application of slags did not increase the emission of GHGs and that the combination of straw with steel slag or biochar could be more effective than straw alone for controlling GHGs emission and improve soil C and nutrient provision
- …