51 research outputs found
Silent Vulnerable Dependency Alert Prediction with Vulnerability Key Aspect Explanation
Due to convenience, open-source software is widely used. For beneficial
reasons, open-source maintainers often fix the vulnerabilities silently,
exposing their users unaware of the updates to threats. Previous works all
focus on black-box binary detection of the silent dependency alerts that suffer
from high false-positive rates. Open-source software users need to analyze and
explain AI prediction themselves. Explainable AI becomes remarkable as a
complementary of black-box AI models, providing details in various forms to
explain AI decisions. Noticing there is still no technique that can discover
silent dependency alert on time, in this work, we propose a framework using an
encoder-decoder model with a binary detector to provide explainable silent
dependency alert prediction. Our model generates 4 types of vulnerability key
aspects including vulnerability type, root cause, attack vector, and impact to
enhance the trustworthiness and users' acceptance to alert prediction. By
experiments with several models and inputs, we confirm CodeBERT with both
commit messages and code changes achieves the best results. Our user study
shows that explainable alert predictions can help users find silent dependency
alert more easily than black-box predictions. To the best of our knowledge,
this is the first research work on the application of Explainable AI in silent
dependency alert prediction, which opens the door of the related domains
Detecting differences across multiple instances of code clones
Clone detectors find similar code fragments (i.e., instances of code clones) and report large numbers of them for industrial systems. To maintain or manage code clones, developers often have to in-vestigate differences of multiple cloned code fragments. However, existing program differencing techniques compare only two code fragments at a time. Developers then have to manually combine several pairwise differencing results. In this paper, we present an approach to automatically detecting differences across multiple clone instances. We have implemented our approach as an Eclipse plugin and evaluated its accuracy with three Java software systems. Our evaluation shows that our algorithm has precision over 97.66% and recall over 95.63 % in three open source Java projects. We also conducted a user study of 18 developers to evaluate the use-fulness of our approach for eight clone-related refactoring tasks. Our study shows that our approach can significantly improve de-velopers ’ performance in refactoring decisions, refactoring details, and task completion time on clone-related refactoring tasks. Au-tomatically detecting differences across multiple clone instances also opens opportunities for building practical applications of code clones in software maintenance, such as auto-generation of appli-cation skeleton, intelligent simultaneous code editing
Pop Quiz! Do Pre-trained Code Models Possess Knowledge of Correct API Names?
Recent breakthroughs in pre-trained code models, such as CodeBERT and Codex,
have shown their superior performance in various downstream tasks. The
correctness and unambiguity of API usage among these code models are crucial
for achieving desirable program functionalities, requiring them to learn
various API fully qualified names structurally and semantically. Recent studies
reveal that even state-of-the-art pre-trained code models struggle with
suggesting the correct APIs during code generation. However, the reasons for
such poor API usage performance are barely investigated. To address this
challenge, we propose using knowledge probing as a means of interpreting code
models, which uses cloze-style tests to measure the knowledge stored in models.
Our comprehensive study examines a code model's capability of understanding API
fully qualified names from two different perspectives: API call and API import.
Specifically, we reveal that current code models struggle with understanding
API names, with pre-training strategies significantly affecting the quality of
API name learning. We demonstrate that natural language context can assist code
models in locating Python API names and generalize Python API name knowledge to
unseen data. Our findings provide insights into the limitations and
capabilities of current pre-trained code models, and suggest that incorporating
API structure into the pre-training process can improve automated API usage and
code representations. This work provides significance for advancing code
intelligence practices and direction for future studies. All experiment
results, data and source code used in this work are available at
\url{https://doi.org/10.5281/zenodo.7902072}
Let's Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation Inference
APIs have intricate relations that can be described in text and represented
as knowledge graphs to aid software engineering tasks. Existing relation
extraction methods have limitations, such as limited API text corpus and
affected by the characteristics of the input text.To address these limitations,
we propose utilizing large language models (LLMs) (e.g., GPT-3.5) as a neural
knowledge base for API relation inference. This approach leverages the entire
Web used to pre-train LLMs as a knowledge base and is insensitive to the
context and complexity of input texts. To ensure accurate inference, we design
our analytic flow as an AI Chain with three AI modules: API FQN Parser, API
Knowledge Extractor, and API Relation Decider. The accuracy of the API FQN
parser and API Relation Decider module are 0.81 and 0.83, respectively. Using
the generative capacity of the LLM and our approach's inference capability, we
achieve an average F1 value of 0.76 under the three datasets, significantly
higher than the state-of-the-art method's average F1 value of 0.40. Compared to
CoT-based method, our AI Chain design improves the inference reliability by
67%, and the AI-crowd-intelligence strategy enhances the robustness of our
approach by 26%
Mining implicit design templates for actionable code reuse
National Research Foundation (NRF) Singapor
A key ‘foxy’ aroma gene is regulated by homologyinduced promoter indels in the iconic juice grape ‘Concord’
‘Concord’, the most well-known juice grape with a parentage of the North American grape species Vitis labrusca L., possesses a special ‘foxy’ aroma predominantly resulted from the accumulation of methyl anthranilate (MA) in berries. This aroma, however, is often perceived as an undesirable attribute by wine consumers and rarely noticeable in the common table and wine grape species V. vinifera. Here we discovered homology-induced promoter indels as a major genetic mechanism for species-specific regulation of a key ‘foxy’ aroma gene, anthraniloyl-CoA:methanol acyltransferase (AMAT), that is responsible for MA biosynthesis. We found the absence of a 426-bp and/or a 42-bp sequence in AMAT promoters highly associated with high levels of AMAT expression and MA accumulation in ‘Concord’ and other V. labrusca-derived grapes. These promoter variants, all with direct and inverted repeats, were further confirmed in more than 1,300 Vitis germplasm. Moreover, functional impact of these indels was validated in transgenic Arabidopsis. Superimposed on the promoter regulation, large structural changes including exonic insertion of a retrotransposon were present at the AMAT locus in some V. vinifera grapes. Elucidation of the AMAT genetic regulation advances our understanding of the ‘foxy’ aroma trait and makes it genetically trackable and amenable in grapevine breeding
The GARP/MYB-related grape transcription factor AQUILO improves cold tolerance and promotes the accumulation of raffinose family oligosaccharides
Grapevine (Vitis vinifera L.) is a widely cultivated fruit crop whose growth and productivity are greatly affected by low temperatures. On the other hand, wild Vitis species represent valuable genetic resources of natural stress tolerance. We have isolated and characterized a MYB-like gene encoding a putative GARP-type transcription factor from Amur grape (V. amurensis) designated as VaAQUILO. AQUILO (AQ) is induced by cold in both V. amurensis and V. vinifera, and its overexpression results in significantly improved tolerance to cold both in transgenic Arabidopsis and in Amur grape calli. In Arabidopsis, the ectopic expression of VaAQ increased antioxidant enzyme activities and up-regulated reactive oxygen species- (ROS) scavenging-related genes. Comparative mRNA sequencing profiling of 35S:VaAQ Arabidopsis plants suggests that this transcription factor is related to phosphate homeostasis like their Arabidopsis closest homologues: AtHRS1 and AtHHO2. However, when a cold stress is imposed, AQ is tightly associated with the cold-responsive pathway and with the raffinose family oligosaccharides (RFOs), as observed by the up-regulation of galactinol synthase (GoLS) and raffinose synthase genes. Gene co-expression network (GCN) and cis-regulatory element (CRE) analyses in grapevine indicated AQ as potentially regulating VvGoLS genes. Increased RFO content was confirmed in both transgenic Arabidopsis and Amur grape calli overexpressing VaAQ. Taken together, our results imply that AQ improves cold tolerance through promoting the accumulation of osmoprotectants
- …