9 research outputs found
AdaCCD: Adaptive Semantic Contrasts Discovery based Cross Lingual Adaptation for Code Clone Detection
Code Clone Detection, which aims to retrieve functionally similar programs
from large code bases, has been attracting increasing attention. Modern
software often involves a diverse range of programming languages. However,
current code clone detection methods are generally limited to only a few
popular programming languages due to insufficient annotated data as well as
their own model design constraints. To address these issues, we present AdaCCD,
a novel cross-lingual adaptation method that can detect cloned codes in a new
language without any annotations in that language. AdaCCD leverages
language-agnostic code representations from pre-trained programming language
models and propose an Adaptively Refined Contrastive Learning framework to
transfer knowledge from resource-rich languages to resource-poor languages. We
evaluate the cross-lingual adaptation results of AdaCCD by constructing a
multilingual code clone detection benchmark consisting of 5 programming
languages. AdaCCD achieves significant improvements over other baselines, and
it is even comparable to supervised fine-tuning.Comment: 10 page
MAP-SNN: Mapping Spike Activities with Multiplicity, Adaptability, and Plasticity into Bio-Plausible Spiking Neural Networks
Spiking Neural Network (SNN) is considered more biologically realistic and
power-efficient as it imitates the fundamental mechanism of the human brain.
Recently, backpropagation (BP) based SNN learning algorithms that utilize deep
learning frameworks have achieved good performance. However,
bio-interpretability is partially neglected in those BP-based algorithms.
Toward bio-plausible BP-based SNNs, we consider three properties in modeling
spike activities: Multiplicity, Adaptability, and Plasticity (MAP). In terms of
multiplicity, we propose a Multiple-Spike Pattern (MSP) with multiple spike
transmission to strengthen model robustness in discrete time-iteration. To
realize adaptability, we adopt Spike Frequency Adaption (SFA) under MSP to
decrease spike activities for improved efficiency. For plasticity, we propose a
trainable convolutional synapse that models spike response current to enhance
the diversity of spiking neurons for temporal feature extraction. The proposed
SNN model achieves competitive performances on neuromorphic datasets: N-MNIST
and SHD. Furthermore, experimental results demonstrate that the proposed three
aspects are significant to iterative robustness, spike efficiency, and temporal
feature extraction capability of spike activities. In summary, this work
proposes a feasible scheme for bio-inspired spike activities with MAP, offering
a new neuromorphic perspective to embed biological characteristics into spiking
neural networks
CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code
Automatically generating function summaries for binaries is an extremely
valuable but challenging task, since it involves translating the execution
behavior and semantics of the low-level language (assembly code) into
human-readable natural language. However, most current works on understanding
assembly code are oriented towards generating function names, which involve
numerous abbreviations that make them still confusing. To bridge this gap, we
focus on generating complete summaries for binary functions, especially for
stripped binary (no symbol table and debug information in reality). To fully
exploit the semantics of assembly code, we present a control flow graph and
pseudo code guided binary code summarization framework called CP-BCS. CP-BCS
utilizes a bidirectional instruction-level control flow graph and pseudo code
that incorporates expert knowledge to learn the comprehensive binary function
execution behavior and logic semantics. We evaluate CP-BCS on 3 different
binary optimization levels (O1, O2, and O3) for 3 different computer
architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is
superior and significantly improves the efficiency of reverse engineering.Comment: EMNLP 2023 Main Conferenc
Improving Long Tailed Document-Level Relation Extraction via Easy Relation Augmentation and Contrastive Learning
Towards real-world information extraction scenario, research of relation
extraction is advancing to document-level relation extraction(DocRE). Existing
approaches for DocRE aim to extract relation by encoding various information
sources in the long context by novel model architectures. However, the inherent
long-tailed distribution problem of DocRE is overlooked by prior work. We argue
that mitigating the long-tailed distribution problem is crucial for DocRE in
the real-world scenario. Motivated by the long-tailed distribution problem, we
propose an Easy Relation Augmentation(ERA) method for improving DocRE by
enhancing the performance of tailed relations. In addition, we further propose
a novel contrastive learning framework based on our ERA, i.e., ERACL, which can
further improve the model performance on tailed relations and achieve
competitive overall DocRE performance compared to the state-of-arts
Tram: A Token-level Retrieval-augmented Mechanism for Source Code Summarization
Automatically generating human-readable text describing the functionality of
a program is the intent of source code summarization. Although Neural Language
Models achieve significant performance in this field, an emerging trend is
combining neural models with external knowledge. Most previous approaches rely
on the sentence-level retrieval and combination paradigm (retrieval of similar
code snippets and use of the corresponding code and summary pairs) on the
encoder side. However, this paradigm is coarse-grained and cannot directly take
advantage of the high-quality retrieved summary tokens on the decoder side. In
this paper, we explore a fine-grained token-level retrieval-augmented mechanism
on the decoder side to help the vanilla neural model generate a better code
summary. Furthermore, to mitigate the limitation of token-level retrieval on
capturing contextual code semantics, we propose to integrate code semantics
into summary tokens. Extensive experiments and human evaluation reveal that our
token-level retrieval-augmented approach significantly improves performance and
is more interpretive