279 research outputs found

    CodeCoT and Beyond: Learning to Program and Test like a Developer

    Full text link
    In natural language processing, transformer-based large language models (LLMs) like GPT-x models developed by OpenAI have revolutionized the landscape. Despite their impressive capabilities, these models often encounter challenges when handling tasks that differ from their training data, resulting in compromised performance. To address this, few-shot learning has emerged as a valuable technique, allowing LLMs to adapt with minimal task-specific data. One innovative strategy, known as Chain-of-Thought Prompting (CoT), has been introduced to guide LLMs in revealing cognitive processes during multi-step reasoning. In this paper, we propose Code Chain-of-Thought~(CodeCoT), which consists of two components: the Vanilla CodeCoT and the Self-exam CodeCoT. The latter incorporates self-examination, empowering the model to iteratively generate code, formulate test cases, and refine its outputs. Specifically, the process entails the generation of test examples by the model corresponding to the code it is tasked to implement. If it fails on the test examples, then it regenerates the code based on the erroneous code and associated error types. Through comprehensive experiments, we observed that both techniques significantly enhance code generation accuracy across various LLM variants. Our evaluation results reveal that CodeCoT improves the code generation effectiveness, including an unprecedented pass@1 accuracy of 79.27\% using the Self-exam CodeCoT approach on the gpt-3.5-turbo-0613 model in the HumanEval dataset

    FMT: Removing Backdoor Feature Maps via Feature Map Testing in Deep Neural Networks

    Full text link
    Deep neural networks have been widely used in many critical applications, such as autonomous vehicles and medical diagnosis. However, their security is threatened by backdoor attack, which is achieved by adding artificial patterns to specific training data. Existing defense strategies primarily focus on using reverse engineering to reproduce the backdoor trigger generated by attackers and subsequently repair the DNN model by adding the trigger into inputs and fine-tuning the model with ground-truth labels. However, once the trigger generated by the attackers is complex and invisible, the defender can not successfully reproduce the trigger. Consequently, the DNN model will not be repaired since the trigger is not effectively removed. In this work, we propose Feature Map Testing~(FMT). Different from existing defense strategies, which focus on reproducing backdoor triggers, FMT tries to detect the backdoor feature maps, which are trained to extract backdoor information from the inputs. After detecting these backdoor feature maps, FMT will erase them and then fine-tune the model with a secure subset of training data. Our experiments demonstrate that, compared to existing defense strategies, FMT can effectively reduce the Attack Success Rate (ASR) even against the most complex and invisible attack triggers. Second, unlike conventional defense methods that tend to exhibit low Robust Accuracy (i.e., the model's accuracy on the poisoned data), FMT achieves higher RA, indicating its superiority in maintaining model performance while mitigating the effects of backdoor attacks~(e.g., FMT obtains 87.40\% RA in CIFAR10). Third, compared to existing feature map pruning techniques, FMT can cover more backdoor feature maps~(e.g., FMT removes 83.33\% of backdoor feature maps from the model in the CIFAR10 \& BadNet scenario).Comment: 12 pages, 4 figure

    Feature Map Testing for Deep Neural Networks

    Full text link
    Due to the widespread application of deep neural networks~(DNNs) in safety-critical tasks, deep learning testing has drawn increasing attention. During the testing process, test cases that have been fuzzed or selected using test metrics are fed into the model to find fault-inducing test units (e.g., neurons and feature maps, activating which will almost certainly result in a model error) and report them to the DNN developer, who subsequently repair them~(e.g., retraining the model with test cases). Current test metrics, however, are primarily concerned with the neurons, which means that test cases that are discovered either by guided fuzzing or selection with these metrics focus on detecting fault-inducing neurons while failing to detect fault-inducing feature maps. In this work, we propose DeepFeature, which tests DNNs from the feature map level. When testing is conducted, DeepFeature will scrutinize every internal feature map in the model and identify vulnerabilities that can be enhanced through repairing to increase the model's overall performance. Exhaustive experiments are conducted to demonstrate that (1) DeepFeature is a strong tool for detecting the model's vulnerable feature maps; (2) DeepFeature's test case selection has a high fault detection rate and can detect more types of faults~(comparing DeepFeature to coverage-guided selection techniques, the fault detection rate is increased by 49.32\%). (3) DeepFeature's fuzzer also outperforms current fuzzing techniques and generates valuable test cases more efficiently.Comment: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2307.1101

    Hybrid ceramics-based cancer theranostics

    Get PDF
    Cancer is a major threat to human lives. Early detection and precisely targeted therapy/therapies for cancer is the most effective way to reduce the difficulties (e.g., side effects, low survival rate, etc.) in treating cancer. To enable effective cancer detection and treatment, ceramic biomaterials have been intensively and extensively investigated owing to their good biocompatibility, high bioactivity, suitable biodegradability and other distinctive properties that are required for medical devices in oncology. Through hybridization with other materials and loading of imaging agents and therapeutic agents, nanobioceramics can form multifunctional nanodevices to simultaneously provide diagnostic and therapeutic functions for cancer patients, and these nanodevices are known as hybrid ceramics-based cancer theranostics. In this review, the recent developments of hybrid ceramics-based cancer theranostics, which include the key aspects such as their preparation, biological evaluation and applications, are summarized and discussed. The challenges and future perspectives for the clinical translation of hybrid ceramics-based cancer theranostics are also discussed. It is believed that the potential of hybrid ceramic nanoparticles as cancer theranostics is high and that the future of these theranostics is bright despite the difficulties along the way for their clinical translation

    Bias Assessment and Mitigation in LLM-based Code Generation

    Full text link
    Utilizing state-of-the-art Large Language Models (LLMs), automatic code generation models play a pivotal role in enhancing the productivity and efficiency of software development coding procedures. As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social biases, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models, yet is under-explored in the literature. This paper presents a novel bias assessment framework that is specifically designed for code generation tasks. Based on this framework, we conduct an extensive evaluation on the bias of nine state-of-the-art LLM-based code generation models. Our findings reveal that first, 31.45\% to 79.93\% code functions generated by our evaluated code generation models are biased, and 9.68\% to 37.37\% code functions' functionality are affected by the bias, which means biases not only exist in code generation models but in some cases, directly affect the functionality of the generated code, posing risks of unintended and possibly harmful software behaviors. To mitigate bias from code generation models, we propose three mitigation strategies, which can decrease the biased code ratio to a very low level of 0.4\% to 4.57\%

    Neuron Sensitivity Guided Test Case Selection for Deep Learning Testing

    Full text link
    Deep Neural Networks~(DNNs) have been widely deployed in software to address various tasks~(e.g., autonomous driving, medical diagnosis). However, they could also produce incorrect behaviors that result in financial losses and even threaten human safety. To reveal the incorrect behaviors in DNN and repair them, DNN developers often collect rich unlabeled datasets from the natural world and label them to test the DNN models. However, properly labeling a large number of unlabeled datasets is a highly expensive and time-consuming task. To address the above-mentioned problem, we propose NSS, Neuron Sensitivity guided test case Selection, which can reduce the labeling time by selecting valuable test cases from unlabeled datasets. NSS leverages the internal neuron's information induced by test cases to select valuable test cases, which have high confidence in causing the model to behave incorrectly. We evaluate NSS with four widely used datasets and four well-designed DNN models compared to SOTA baseline methods. The results show that NSS performs well in assessing the test cases' probability of fault triggering and model improvement capabilities. Specifically, compared with baseline approaches, NSS obtains a higher fault detection rate~(e.g., when selecting 5\% test case from the unlabeled dataset in MNIST \& LeNet1 experiment, NSS can obtain 81.8\% fault detection rate, 20\% higher than baselines)

    Transforming Research on Recreational Ecosystem Services into Applications and Governance

    Get PDF
    The science-practice gap has recently been discussed as a critical challenge restricting sustainable growth and development in all facets of our society, including explorations of Recreation Ecosystem Services (RES). To better explore how well the scientific study of RES and its application are connected, this paper aims to synthesize empirical evidence based on an in-depth and systematic literature review. We found that studies of RES have not effectively transformed into the decision-making and long-term planning of our cities. From 2005 to 2020, only 13% of studies referred to specific applications, and about 40% of papers mentioned no applications or practical implications for their research. However, RES research has many potential applications, which can be categorised into six main aspects. In terms of non-spatial improvement: Improved monetary benefits (40%), non-monetary benefits (30%); in terms of spatial improvement: space with high recreational potential or degradation (7%), the relation between supply and demand (7%); and Cross-service governance (16%). After combining the results of various studies, we developed a framework starting from applicable problems and their solutions, which can incorporate the outcomes of RES research while systematically narrowing down the research questions and methods. The framework offers a starting point for further research that can modify and improve in bridging science-practice gaps in RES studies.National Natural Science Foundation of ChinaPeer Reviewe

    Greenhouse gas emissions in a subtropical jasmine plantation managed with straw combined with industrial and agricultural wastes

    Get PDF
    The effects of straw alone or combined with industrial and agricultural wastes as fertilizers on greenhouse gas (GHG) emissions are still poorly known in cropland areas. Here, we studied the effects of 3.5 Mg ha−1 straw and 3.5 Mg ha−1 straw combined with 8 Mg ha−1 of diverse wastes on GHG emission in a subtropical Jasminum sambac plantation in southeastern China. There were five treatments in a completely randomized block design: control, straw only, straw + biochar, straw + steel slag, and straw + gypsum slag. Emissions of carbon dioxide were generally higher in the treatments with waste than in the control or straw-only treatments, whereas the contrary pattern was observed in CH4 and N2O emission rates. Moreover, the total global warming potentials (GWPs) were no significantly higher in most of the amended treatments as compared to the control and straw-only treatments. In relation to the treatment with only straw, GWPs were 9.4% lower when steel slag was used. This finding could be a consequence of Fe amount added by steel slag, which would limit and inhibit the emissions of GHGs and their transport from soil to atmosphere. Our results showed that the application of slags did not increase the emission of GHGs and that the combination of straw with steel slag or biochar could be more effective than straw alone for controlling GHGs emission and improve soil C and nutrient provision
    • …
    corecore