Search CORE

25 research outputs found

開放系オゾン濃度制御システムの開発とオゾンがコムギ生産に及ぼす広域的影響の推定

Author: TANG HAOYE
タンハオイェ
Publication venue: 農学生命科学研究科
Publication date: 07/06/2013
Field of study

学位の種別:論文博士University of Tokyo(東京大学

Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation

Author: Chen Zhenghan
Ezzini Saad
Kim Kisub
Klein Jacques
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 12/12/2023
Field of study

In the face of growing vulnerabilities found in open-source software, the need to identify {discreet} security patches has become paramount. The lack of consistency in how software providers handle maintenance often leads to the release of security patches without comprehensive advisories, leaving users vulnerable to unaddressed security risks. To address this pressing issue, we introduce a novel security patch detection system, LLMDA, which capitalizes on Large Language Models (LLMs) and code-text alignment methodologies for patch review, data enhancement, and feature combination. Within LLMDA, we initially utilize LLMs for examining patches and expanding data of PatchDB and SPI-DB, two security patch datasets from recent literature. We then use labeled instructions to direct our LLMDA, differentiating patches based on security relevance. Following this, we apply a PTFormer to merge patches with code, formulating hybrid attributes that encompass both the innate details and the interconnections between the patches and the code. This distinctive combination method allows our system to capture more insights from the combined context of patches and code, hence improving detection precision. Finally, we devise a probabilistic batch contrastive learning mechanism within batches to augment the capability of the our LLMDA in discerning security patches. The results reveal that LLMDA significantly surpasses the start of the art techniques in detecting security patches, underscoring its promise in fortifying software maintenance

arXiv.org e-Print Archive

Patch-CLIP: A Patch-Text Pre-Trained Model

Author: Bissyande Tegawende F.
Chen Zhenghan
Ezzini Saad
Klein Jacques
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 19/10/2023
Field of study

In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they predominantly excel in either predictive tasks, such as security patch classification, or in generative tasks such as patch description generation. This dichotomy is further exacerbated by a prevalent dependency on potentially noisy data sources. Specifically, many models utilize patches integrated with Abstract Syntax Trees (AST) that, unfortunately, may contain parsing inaccuracies, thus acting as a suboptimal source of supervision. In response to these challenges, we introduce PATCH-CLIP, a novel pre-training framework for patches and natural language text. PATCH-CLIP deploys a triple-loss training strategy for 1) patch-description contrastive learning, which enables to separate patches and descriptions in the embedding space, 2) patch-description matching, which ensures that each patch is associated to its description in the embedding space, and 3) patch-description generation, which ensures that the patch embedding is effective for generation. These losses are implemented for joint learning to achieve good performance in both predictive and generative tasks involving patches. Empirical evaluations focusing on patch description generation, demonstrate that PATCH-CLIP sets new state of the art performance, consistently outperforming the state-of-the-art in metrics like BLEU, ROUGE-L, METEOR, and Recall

arXiv.org e-Print Archive

App Review Driven Collaborative Bug Finding

Author: Bissyande Tegawendé F.
Klein Jacques
Kong Pingfan
Liu Kui
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 23/01/2023
Field of study

Software development teams generally welcome any effort to expose bugs in their code base. In this work, we build on the hypothesis that mobile apps from the same category (e.g., two web browser apps) may be affected by similar bugs in their evolution process. It is therefore possible to transfer the experience of one historical app to quickly find bugs in its new counterparts. This has been referred to as collaborative bug finding in the literature. Our novelty is that we guide the bug finding process by considering that existing bugs have been hinted within app reviews. Concretely, we design the BugRMSys approach to recommend bug reports for a target app by matching historical bug reports from apps in the same category with user app reviews of the target app. We experimentally show that this approach enables us to quickly expose and report dozens of bugs for targeted apps such as Brave (web browser app). BugRMSys's implementation relies on DistilBERT to produce natural language text embeddings. Our pipeline considers similarities between bug reports and app reviews to identify relevant bugs. We then focus on the app review as well as potential reproduction steps in the historical bug report (from a same-category app) to reproduce the bugs. Overall, after applying BugRMSys to six popular apps, we were able to identify, reproduce and report 20 new bugs: among these, 9 reports have been already triaged, 6 were confirmed, and 4 have been fixed by official development teams, respectively

arXiv.org e-Print Archive

Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation

Author: Chen Zhenghan
Ezzini Saad
Kim Kisub
Klein Jacques
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 01/12/2023
Field of study

Lancaster E-Prints

Learning to Represent Patches

Author: Bissyande Tegawende F.
Chen Zhenghan
Ezzini Saad
Habib Andrew
Kabore Abdoul Kader
Klein Jacques
Pian Weiguo
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 31/08/2023
Field of study

Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel method, Patcherizer. It delves into the intentions of context and structure, merging the surrounding code context with two innovative representations. These capture the intention in code changes and the intention in AST structural modifications pre and post-patch. This holistic representation aptly captures a patch's underlying intentions. Patcherizer employs graph convolutional neural networks for structural intention graph representation and transformers for intention sequence representation. We evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch description generation, (2) Patch accuracy prediction, and (3) Patch intention identification. Our experiments demonstrate the representation's efficacy across all tasks, outperforming state-of-the-art methods. For example, in patch description generation, Patcherizer excels, showing an average boost of 19.39% in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores

arXiv.org e-Print Archive

Innovatives Stickstoffmanagement und innovative Düngetechnologien in den intensiv genutzten Reis-Weizen Anbausystemen Südostchinas

Author: Cai Zucong
Han Yong
Hofmeier Maximilian
Nieder Rolf
Roelcke Marco
Tang Haoye
Publication venue
Publication date: 01/09/2009
Field of study

Als Teil eines interdisziplinären deutsch-chinesischen Forschungsverbundes wurden mit Beginn der Winterweizen-frucht 2008/09 in zwei Kreisen der Provinz Jiangsu im Südosten Chinas Feldversuche zu Demonstrationszwecken eingerichtet. Hierbei wurde in drei verschiedenen Behandlungen „Standard“, „Reduziert“ und eine Nullparzelle ausschließlich die Menge der mineralischen Stickstoff (N)-Düngung variiert. Die Ergebnisse nach der Winterweizenernte zeigen, dass in der „Reduzierten“ Behandlung kein Ertragsrückgang zu verzeichnen war. Parallel hierzu konnte außerdem im Vergleich zur „Standard“ Variante eine Abnahme der Rest-Nmin-Gehalte im Boden nach der Ernte um knapp 40 % festgestellt werden

DBGPrints Repository

Patch-CLIP : A Patch-Text Pre-Trained Model

Author: Bissyande Tegawende F.
Chen Zhenghan
Ezzini Saad
Klein Jacques
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 19/10/2023
Field of study

Lancaster E-Prints

Multilevel Semantic Embedding of Software Patches: A Fine-to-Coarse Grained Approach Towards Security Patch Detection

Author: Bissyande Tegawende F.
Chen zhenghan
Ezzini Saad
Klein Jacques
Song Yewei
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 01/08/2023
Field of study

The growth of open-source software has increased the risk of hidden vulnerabilities that can affect downstream software applications. This concern is further exacerbated by software vendors' practice of silently releasing security patches without explicit warnings or common vulnerability and exposure (CVE) notifications. This lack of transparency leaves users unaware of potential security threats, giving attackers an opportunity to take advantage of these vulnerabilities. In the complex landscape of software patches, grasping the nuanced semantics of a patch is vital for ensuring secure software maintenance. To address this challenge, we introduce a multilevel Semantic Embedder for security patch detection, termed MultiSEM. This model harnesses word-centric vectors at a fine-grained level, emphasizing the significance of individual words, while the coarse-grained layer adopts entire code lines for vector representation, capturing the essence and interrelation of added or removed lines. We further enrich this representation by assimilating patch descriptions to obtain a holistic semantic portrait. This combination of multi-layered embeddings offers a robust representation, balancing word complexity, understanding code-line insights, and patch descriptions. Evaluating MultiSEM for detecting patch security, our results demonstrate its superiority, outperforming state-of-the-art models with promising margins: a 22.46\% improvement on PatchDB and a 9.21\% on SPI-DB in terms of the F1 metric

Lancaster E-Prints

Hyperbolic Code Retrieval: A Novel Approach for Efficient Code Search Using Hyperbolic Space Embeddings

Author: Bissyande Tegawende F.
Chen zhenghan
Ezzini Saad
Klein Jacques
Song Yewei
Tang Xunzhu
Tian Haoye
Publication venue
Publication date: 01/08/2023
Field of study

Within the realm of advanced code retrieval, existing methods have primarily relied on intricate matching and attention-based mechanisms. However, these methods often lead to computational and memory inefficiencies, posing a significant challenge to their real-world applicability. To tackle this challenge, we propose a novel approach, the Hyperbolic Code QA Matching (HyCoQA). This approach leverages the unique properties of Hyperbolic space to express connections between code fragments and their corresponding queries, thereby obviating the necessity for intricate interaction layers. The process commences with a reimagining of the code retrieval challenge, framed within a question-answering (QA) matching framework, constructing a dataset with triple matches characterized as \texttt{}. These matches are subsequently processed via a static BERT embedding layer, yielding initial embeddings. Thereafter, a hyperbolic embedder transforms these representations into hyperbolic space, calculating distances between the codes and descriptions. The process concludes by implementing a scoring layer on these distances and leveraging hinge loss for model training. Especially, the design of HyCoQA inherently facilitates self-organization, allowing for the automatic detection of embedded hierarchical patterns during the learning phase. Experimentally, HyCoQA showcases remarkable effectiveness in our evaluations: an average performance improvement of 3.5\% to 4\% compared to state-of-the-art code retrieval techniques

Lancaster E-Prints