Search CORE

137 research outputs found

Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach

Author: Ji Zhenlan
Li Zongjie
Ma Pingchuan
Wang Shuai
Publication venue
Publication date: 10/10/2023
Field of study

While code generation has been widely used in various software development scenarios, the quality of the generated code is not guaranteed. This has been a particular concern in the era of large language models (LLMs)- based code generation, where LLMs, deemed a complex and powerful black-box model, is instructed by a high-level natural language specification, namely a prompt, to generate code. Nevertheless, effectively evaluating and explaining the code generation capability of LLMs is inherently challenging, given the complexity of LLMs and the lack of transparency. Inspired by the recent progress in causality analysis and its application in software engineering, this paper launches a causality analysis-based approach to systematically analyze the causal relations between the LLM input prompts and the generated code. To handle various technical challenges in this study, we first propose a novel causal graph-based representation of the prompt and the generated code, which is established over the fine-grained, human-understandable concepts in the input prompts. The formed causal graph is then used to identify the causal relations between the prompt and the derived code. We illustrate the insights that our framework can provide by studying over 3 popular LLMs with over 12 prompt adjustment strategies. The results of these studies illustrate the potential of our technique to provide insights into LLM effectiveness, and aid end-users in understanding predictions. Additionally, we demonstrate that our approach provides actionable insights to improve the quality of the LLM-generated code by properly calibrating the prompt

arXiv.org e-Print Archive

A novel buckwheat protein with a beneficial effect in atherosclerosis was purified from Fagopyrum tataricum (L.) Gaertn.

Author: Huang Lin
Li Zongjie
Tang Wen
Wang Qing
Zhou Xiaoli
Zhou Yiming
Publication venue: University of Belgrade, University of Novi Sad
Publication date: 01/01/2013
Field of study

Buckwheat seeds contain many kinds of functional compounds that are of benefit to patients with cardiovascular disease. In this research, a water-soluble buckwheat protein was isolated and purified through a DEAE-Sepharose anion exchange column and Sephadex G-75 gel chromatography. The isolated buckwheat protein fractions exhibited hypocholesterolemic activity in a HepG2 cell model and demonstrated prominent bile acid salt-binding activity in an in vitro assay. The antioxidative activity of protein fractions with hypolipidemic effects was detected in a free radical scavenging experiment. The buckwheat protein fraction with the most obvious hypolipidemic activity and free radical scavenging activity was named as WSBWP. Its molecular weight was estimated by SDS-PAGE electrophoresis to be 38 kDa. It could become a potential candidate in the treatment of atherosclerosis

Crossref

Directory of Open Access Journals

Engineering Clostridium Strain to Accept Unmethylated DNA

Author: Hongjun Dong
Jian R. Lu
Yanping Zhang
Yin Li
Zongjie Dai
Publication venue: Public Library of Science
Publication date: 09/02/2010
Field of study

It is difficult to genetically manipulate the medically and biotechnologically important genus Clostridium due to the existence of the restriction and modification (RM) systems. We identified and engineered the RM system of a model clostridial species, C. acetobutylicum, with the aim to allow the host to accept the unmethylated DNA efficiently. A gene CAC1502 putatively encoding the type II restriction endonuclease Cac824I was identified from the genome of C. acetobutylicum DSM1731, and disrupted using the ClosTron system based on group II intron insertion. The resulting strain SMB009 lost the type II restriction endonuclease activity, and can be transformed with unmethylated DNA as efficiently as with methylated DNA. The strategy reported here makes it easy to genetically modify the clostridial species using unmethylated DNA, which will help to advance the understanding of the clostridial physiology from the molecular level

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

DAA: A Delta Age AdaIN operation for age estimation via binary code transformer

Author: Chen Ping
Jiang Zongjie
Li Ye
Tao Ju
Wang Bing
Xiao Bin
Zhang Xingpeng
Publication venue
Publication date: 14/03/2023
Field of study

Naked eye recognition of age is usually based on comparison with the age of others. However, this idea is ignored by computer tasks because it is difficult to obtain representative contrast images of each age. Inspired by the transfer learning, we designed the Delta Age AdaIN (DAA) operation to obtain the feature difference with each age, which obtains the style map of each age through the learned values representing the mean and standard deviation. We let the input of transfer learning as the binary code of age natural number to obtain continuous age feature information. The learned two groups of values in Binary code mapping are corresponding to the mean and standard deviation of the comparison ages. In summary, our method consists of four parts: FaceEncoder, DAA operation, Binary code mapping, and AgeDecoder modules. After getting the delta age via AgeDecoder, we take the average value of all comparison ages and delta ages as the predicted age. Compared with state-of-the-art methods, our method achieves better performance with fewer parameters on multiple facial age datasets.Comment: Accepted by CVPR2023; 8 pages, 3 figure

arXiv.org e-Print Archive

On the Feasibility of Specialized Ability Stealing for Large Language Code Models

Author: Gao Cuiyun
Li Zongjie
Liu Chaowei
Ma Pingchuan
Wang Chaozheng
Wang Shuai
Wu Daoyuan
Publication venue
Publication date: 09/03/2023
Field of study

Recent progress in large language code models (LLCMs) has led to a dramatic surge in the use of software development. Nevertheless, it is widely known that training a well-performed LLCM requires a plethora of workforce for collecting the data and high quality annotation. Additionally, the training dataset may be proprietary (or partially open source to the public), and the training process is often conducted on a large-scale cluster of GPUs with high costs. Inspired by the recent success of imitation attacks in stealing computer vision and natural language models, this work launches the first imitation attack on LLCMs: by querying a target LLCM with carefully-designed queries and collecting the outputs, the adversary can train an imitation model that manifests close behavior with the target LLCM. We systematically investigate the effectiveness of launching imitation attacks under different query schemes and different LLCM tasks. We also design novel methods to polish the LLCM outputs, resulting in an effective imitation training process. We summarize our findings and provide lessons harvested in this study that can help better depict the attack surface of LLCMs. Our research contributes to the growing body of knowledge on imitation attacks and defenses in deep neural models, particularly in the domain of code related tasks.Comment: 11 page

arXiv.org e-Print Archive

Memristive Non-Volatile Memory Based on Graphene Materials

Author: Huang Yanbo
Li Puzhuo
Mitrovic Ivona Z
Qi Yanfei
Shen Zongjie
Wen Jiacheng
Yang Li
Zhao Cezhou
Zhao Chun
Publication venue: 'MDPI AG'
Publication date: 01/04/2020
Field of study

Resistive random access memory (RRAM), which is considered as one of the most promising next-generation non-volatile memory (NVM) devices and a representative of memristor technologies, demonstrated great potential in acting as an artificial synapse in the industry of neuromorphic systems and artificial intelligence (AI), due its advantages such as fast operation speed, low power consumption, and high device density. Graphene and related materials (GRMs), especially graphene oxide (GO), acting as active materials for RRAM devices, are considered as a promising alternative to other materials including metal oxides and perovskite materials. Herein, an overview of GRM-based RRAM devices is provided, with discussion about the properties of GRMs, main operation mechanisms for resistive switching (RS) behavior, figure of merit (FoM) summary, and prospect extension of GRM-based RRAM devices. With excellent physical and chemical advantages like intrinsic Young’s modulus (1.0 TPa), good tensile strength (130 GPa), excellent carrier mobility (2.0 × 105 cm2∙V−1∙s−1), and high thermal (5000 Wm−1∙K−1) and superior electrical conductivity (1.0 × 106 S∙m−1), GRMs can act as electrodes and resistive switching media in RRAM devices. In addition, the GRM-based interface between electrode and dielectric can have an effect on atomic diffusion limitation in dielectric and surface effect suppression. Immense amounts of concrete research indicate that GRMs might play a significant role in promoting the large-scale commercialization possibility of RRAM devices

University of Liverpool Repository

Refining Decompiled C Code with Large Language Models

Author: Li Zongjie
Liu Zhibo
Nie Sen
Tang Qiyi
Wang Huaijin
Wang Shuai
Wong Wai Kin
Wu Shi
Publication venue
Publication date: 28/11/2023
Field of study

A C decompiler converts an executable into source code. The recovered C source code, once re-compiled, is expected to produce an executable with the same functionality as the original executable. With over twenty years of development, C decompilers have been widely used in production to support reverse engineering applications. Despite the prosperous development of C decompilers, it is widely acknowledged that decompiler outputs are mainly used for human consumption, and are not suitable for automatic recompilation. Often, a substantial amount of manual effort is required to fix the decompiler outputs before they can be recompiled and executed properly. This paper is motived by the recent success of large language models (LLMs) in comprehending dense corpus of natural language. To alleviate the tedious, costly and often error-prone manual effort in fixing decompiler outputs, we investigate the feasibility of using LLMs to augment decompiler outputs, thus delivering recompilable decompilation. Note that different from previous efforts that focus on augmenting decompiler outputs with higher readability (e.g., recovering type/variable names), we focus on augmenting decompiler outputs with recompilability, meaning to generate code that can be recompiled into an executable with the same functionality as the original executable. We conduct a pilot study to characterize the obstacles in recompiling the outputs of the de facto commercial C decompiler -- IDA-Pro. We then propose a two-step, hybrid approach to augmenting decompiler outputs with LLMs. We evaluate our approach on a set of popular C test cases, and show that our approach can deliver a high recompilation success rate to over 75% with moderate effort, whereas none of the IDA-Pro's original outputs can be recompiled. We conclude with a discussion on the limitations of our approach and promising future research directions

arXiv.org e-Print Archive