Search CORE

9 research outputs found

AdaCCD: Adaptive Semantic Contrasts Discovery based Cross Lingual Adaptation for Code Clone Detection

Author: Du Yangkai
Ji Shouling
Ma Tengfei
Wu Lingfei
Zhang Xuhong
Publication venue
Publication date: 13/11/2023
Field of study

Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constraints. To address these issues, we present AdaCCD, a novel cross-lingual adaptation method that can detect cloned codes in a new language without any annotations in that language. AdaCCD leverages language-agnostic code representations from pre-trained programming language models and propose an Adaptively Refined Contrastive Learning framework to transfer knowledge from resource-rich languages to resource-poor languages. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages. AdaCCD achieves significant improvements over other baselines, and it is even comparable to supervised fine-tuning.Comment: 10 page

arXiv.org e-Print Archive

MAP-SNN: Mapping Spike Activities with Multiplicity, Adaptability, and Plasticity into Bio-Plausible Spiking Neural Networks

Author: Chen Mufeng
Du Yangkai
Li Erping
Wang Aili
Wang Gaoang
Yu Chengting
Publication venue
Publication date: 21/04/2022
Field of study

Spiking Neural Network (SNN) is considered more biologically realistic and power-efficient as it imitates the fundamental mechanism of the human brain. Recently, backpropagation (BP) based SNN learning algorithms that utilize deep learning frameworks have achieved good performance. However, bio-interpretability is partially neglected in those BP-based algorithms. Toward bio-plausible BP-based SNNs, we consider three properties in modeling spike activities: Multiplicity, Adaptability, and Plasticity (MAP). In terms of multiplicity, we propose a Multiple-Spike Pattern (MSP) with multiple spike transmission to strengthen model robustness in discrete time-iteration. To realize adaptability, we adopt Spike Frequency Adaption (SFA) under MSP to decrease spike activities for improved efficiency. For plasticity, we propose a trainable convolutional synapse that models spike response current to enhance the diversity of spiking neurons for temporal feature extraction. The proposed SNN model achieves competitive performances on neuromorphic datasets: N-MNIST and SHD. Furthermore, experimental results demonstrate that the proposed three aspects are significant to iterative robustness, spike efficiency, and temporal feature extraction capability of spike activities. In summary, this work proposes a feasible scheme for bio-inspired spike activities with MAP, offering a new neuromorphic perspective to embed biological characteristics into spiking neural networks

arXiv.org e-Print Archive

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

Author: Du Yangkai
Ji Shouling
Liu Peiyu
Ma Tengfei
Wang Wenhai
Wu Lingfei
Ye Tong
Zhang Xuhong
Publication venue
Publication date: 24/10/2023
Field of study

Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that make them still confusing. To bridge this gap, we focus on generating complete summaries for binary functions, especially for stripped binary (no symbol table and debug information in reality). To fully exploit the semantics of assembly code, we present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics. We evaluate CP-BCS on 3 different binary optimization levels (O1, O2, and O3) for 3 different computer architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is superior and significantly improves the efficiency of reverse engineering.Comment: EMNLP 2023 Main Conferenc

arXiv.org e-Print Archive

Capillary ion chromatography–mass spectrometry for simultaneous determination of glucosylglycerol and sucrose in intracellular extracts of cyanobacteria

Author: Antonio
Arfelli
Barron
Bruggink
Bruggink
Burgess
Du
Guignard
Haddad
Hagemann
He Cui
Huizhou Liu
Jun Gao
Knee
Kotnik
Menglong Yang
Pocard
Roder
Sawangwan
Schoor
Takenaka
Tan
Wenhui Liang
Wouters
Wouters
Xu
Yangkai Duan
Yun Fa
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Improving Long Tailed Document-Level Relation Extraction via Easy Relation Augmentation and Contrastive Learning

Author: Du Yangkai
Ji Shouling
Long Bo
Ma Tengfei
Wu Lingfei
Wu Yiming
Zhang Xuhong
Publication venue
Publication date: 21/05/2022
Field of study

Towards real-world information extraction scenario, research of relation extraction is advancing to document-level relation extraction(DocRE). Existing approaches for DocRE aim to extract relation by encoding various information sources in the long context by novel model architectures. However, the inherent long-tailed distribution problem of DocRE is overlooked by prior work. We argue that mitigating the long-tailed distribution problem is crucial for DocRE in the real-world scenario. Motivated by the long-tailed distribution problem, we propose an Easy Relation Augmentation(ERA) method for improving DocRE by enhancing the performance of tailed relations. In addition, we further propose a novel contrastive learning framework based on our ERA, i.e., ERACL, which can further improve the model performance on tailed relations and achieve competitive overall DocRE performance compared to the state-of-arts

arXiv.org e-Print Archive

Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization

Author: Du Yangkai
Ji Shouling
Liu Peiyu
Ma Tengfei
Wang Wenhai
Wu Lingfei
Ye Tong
Zhang Xuhong
Publication venue
Publication date: 18/05/2023
Field of study

Automatically generating human-readable text describing the functionality of a program is the intent of source code summarization. Although Neural Language Models achieve significant performance in this field, an emerging trend is combining neural models with external knowledge. Most previous approaches rely on the sentence-level retrieval and combination paradigm (retrieval of similar code snippets and use of the corresponding code and summary pairs) on the encoder side. However, this paradigm is coarse-grained and cannot directly take advantage of the high-quality retrieved summary tokens on the decoder side. In this paper, we explore a fine-grained token-level retrieval-augmented mechanism on the decoder side to help the vanilla neural model generate a better code summary. Furthermore, to mitigate the limitation of token-level retrieval on capturing contextual code semantics, we propose to integrate code semantics into summary tokens. Extensive experiments and human evaluation reveal that our token-level retrieval-augmented approach significantly improves performance and is more interpretive

arXiv.org e-Print Archive

Sucrose secreted by the engineered cyanobacterium and its fermentability

Author: A. M. Varman
B. K. Nayak
D. C. Ducat
E. M. Clerico
Feiyan Liang
G. M. Martinez-Noel
H. Niederholtmeyer
K. B. Möllers
M. D. Deng
N. Li
Quan Luo
R. Y. Stanier
S. S. Dhaliwal
W. Du
W. Jiang
X. Tan
Xuefeng Lu
Y. Morishige
Yangkai Duan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Polyethylene glycol infused acid-etched halloysite nanotubes for melt-spun polyamide-based composite phase change fibers

Author: Andriamitantsoa
Cai
Chen
Deng
Dincer
Du
Du
Elshad
Garcia-Garcia
Gunasekara
Han
He
Hengxue Xiang
Jialiang Zhou
Karaman
Kholmanov
Konuklu
Li
Li
Liang
Lvov
Lvov
Lvov
Mei
Meifang Zhu
Mondal
Moon
Mugaanire Tendo Innocent
Pielichowska
Pielichowska
Qian
Qian
Radhakrishnan
Sangwichien
Sharma
Shena
Song
Sundararajan
Tang
Vijayalakshmi
Wang
Xiang
Xiang
Xu
Yangkai Zhang
Yao
Yavari
Yuan
Yuan
Yuan
Yuan
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zheng
Zhou
Şentürk
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref