10 research outputs found
Class-Incremental Grouping Network for Continual Audio-Visual Learning
Continual learning is a challenging problem in which models need to be
trained on non-stationary data across sequential tasks for class-incremental
learning. While previous methods have focused on using either regularization or
rehearsal-based frameworks to alleviate catastrophic forgetting in image
classification, they are limited to a single modality and cannot learn compact
class-aware cross-modal representations for continual audio-visual learning. To
address this gap, we propose a novel class-incremental grouping network (CIGN)
that can learn category-wise semantic features to achieve continual
audio-visual learning. Our CIGN leverages learnable audio-visual class tokens
and audio-visual grouping to continually aggregate class-aware features.
Additionally, it utilizes class tokens distillation and continual grouping to
prevent forgetting parameters learned from previous tasks, thereby improving
the model's ability to capture discriminative audio-visual categories. We
conduct extensive experiments on VGGSound-Instruments, VGGSound-100, and
VGG-Sound Sources benchmarks. Our experimental results demonstrate that the
CIGN achieves state-of-the-art audio-visual class-incremental learning
performance. Code is available at https://github.com/stoneMo/CIGN.Comment: ICCV 2023. arXiv admin note: text overlap with arXiv:2303.1705
Audio-Visual Class-Incremental Learning
In this paper, we introduce audio-visual class-incremental learning, a
class-incremental learning scenario for audio-visual video recognition. We
demonstrate that joint audio-visual modeling can improve class-incremental
learning, but current methods fail to preserve semantic similarity between
audio and visual features as incremental step grows. Furthermore, we observe
that audio-visual correlations learned in previous tasks can be forgotten as
incremental steps progress, leading to poor performance. To overcome these
challenges, we propose AV-CIL, which incorporates Dual-Audio-Visual Similarity
Constraint (D-AVSC) to maintain both instance-aware and class-aware semantic
similarity between audio-visual modalities and Visual Attention Distillation
(VAD) to retain previously learned audio-guided visual attentive ability. We
create three audio-visual class-incremental datasets, AVE-Class-Incremental
(AVE-CI), Kinetics-Sounds-Class-Incremental (K-S-CI), and
VGGSound100-Class-Incremental (VS100-CI) based on the AVE, Kinetics-Sounds, and
VGGSound datasets, respectively. Our experiments on AVE-CI, K-S-CI, and
VS100-CI demonstrate that AV-CIL significantly outperforms existing
class-incremental learning methods in audio-visual class-incremental learning.
Code and data are available at: https://github.com/weiguoPian/AV-CIL_ICCV2023.Comment: Accepted at ICCV 202
LaFiCMIL: Rethinking Large File Classification from the Perspective of Correlated Multiple Instance Learning
Transformer-based models, such as BERT, have revolutionized various language
tasks, but still struggle with large file classification due to their input
limit (e.g., 512 tokens). Despite several attempts to alleviate this
limitation, no method consistently excels across all benchmark datasets,
primarily because they can only extract partial essential information from the
input file. Additionally, they fail to adapt to the varied properties of
different types of large files. In this work, we tackle this problem from the
perspective of correlated multiple instance learning. The proposed approach,
LaFiCMIL, serves as a versatile framework applicable to various large file
classification tasks covering binary, multi-class, and multi-label
classification tasks, spanning various domains including Natural Language
Processing, Programming Language Processing, and Android Analysis. To evaluate
its effectiveness, we employ eight benchmark datasets pertaining to Long
Document Classification, Code Defect Detection, and Android Malware Detection.
Leveraging BERT-family models as feature extractors, our experimental results
demonstrate that LaFiCMIL achieves new state-of-the-art performance across all
benchmark datasets. This is largely attributable to its capability of scaling
BERT up to nearly 20K tokens, running on a single Tesla V-100 GPU with 32G of
memory.Comment: 12 pages; update results; manuscript revisio
Learning to Represent Patches
Patch representation is crucial in automating various software engineering
tasks, like determining patch accuracy or summarizing code changes. While
recent research has employed deep learning for patch representation, focusing
on token sequences or Abstract Syntax Trees (ASTs), they often miss the
change's semantic intent and the context of modified lines. To bridge this gap,
we introduce a novel method, Patcherizer. It delves into the intentions of
context and structure, merging the surrounding code context with two innovative
representations. These capture the intention in code changes and the intention
in AST structural modifications pre and post-patch. This holistic
representation aptly captures a patch's underlying intentions. Patcherizer
employs graph convolutional neural networks for structural intention graph
representation and transformers for intention sequence representation. We
evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch
description generation, (2) Patch accuracy prediction, and (3) Patch intention
identification. Our experiments demonstrate the representation's efficacy
across all tasks, outperforming state-of-the-art methods. For example, in patch
description generation, Patcherizer excels, showing an average boost of 19.39%
in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores
Predicting Patch Correctness Based on the Similarity of Failing Test Cases
Towards predicting patch correctness in APR, we propose a simple, but novel
hypothesis on how the link between the patch behaviour and failing test
specifications can be drawn: similar failing test cases should require similar
patches. We then propose BATS, an unsupervised learning-based system to predict
patch correctness by checking patch Behaviour Against failing Test
Specification. BATS exploits deep representation learning models for code and
patches: for a given failing test case, the yielded embedding is used to
compute similarity metrics in the search for historical similar test cases in
order to identify the associated applied patches, which are then used as a
proxy for assessing generated patch correctness. Experimentally, we first
validate our hypothesis by assessing whether ground-truth developer patches
cluster together in the same way that their associated failing test cases are
clustered. Then, after collecting a large dataset of 1278 plausible patches
(written by developers or generated by some 32 APR tools), we use BATS to
predict correctness: BATS achieves an AUC between 0.557 to 0.718 and a recall
between 0.562 and 0.854 in identifying correct patches. Compared against
previous work, we demonstrate that our approach outperforms state-of-the-art
performance in patch correctness prediction, without the need for large labeled
patch datasets in contrast with prior machine learning-based approaches. While
BATS is constrained by the availability of similar test cases, we show that it
can still be complementary to existing approaches: used in conjunction with a
recent approach implementing supervised learning, BATS improves the overall
recall in detecting correct patches. We finally show that BATS can be
complementary to the state-of-the-art PATCH-SIM dynamic approach of identifying
the correct patches for APR tools
Learning to Represent Patches
Patch representation is crucial in automating various software engineering tasks, like determining patch accuracy or summarizing code changes. While recent research has employed deep learning for patch representation, focusing on token sequences or Abstract Syntax Trees (ASTs), they often miss the change's semantic intent and the context of modified lines. To bridge this gap, we introduce a novel method, Patcherizer. It delves into the intentions of context and structure, merging the surrounding code context with two innovative representations. These capture the intention in code changes and the intention in AST structural modifications pre and post-patch. This holistic representation aptly captures a patch's underlying intentions. Patcherizer employs graph convolutional neural networks for structural intention graph representation and transformers for intention sequence representation. We evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch description generation, (2) Patch accuracy prediction, and (3) Patch intention identification. Our experiments demonstrate the representation's efficacy across all tasks, outperforming state-of-the-art methods. For example, in patch description generation, Patcherizer excels, showing an average boost of 19.39% in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores
MetaTPTrans: A Meta Learning Approach for Multilingual Code Representation Learning
Representation learning of source code is essential for applying machine
learning to software engineering tasks. Learning code representation across
different programming languages has been shown to be more effective than
learning from single-language datasets, since more training data from
multi-language datasets improves the model's ability to extract
language-agnostic information from source code. However, existing
multi-language models overlook the language-specific information which is
crucial for downstream tasks that is training on multi-language datasets, while
only focusing on learning shared parameters among the different languages. To
address this problem, we propose MetaTPTrans, a meta learning approach for
multilingual code representation learning. MetaTPTrans generates different
parameters for the feature extractor according to the specific programming
language of the input source code snippet, enabling the model to learn both
language-agnostics and language-specific information. Experimental results show
that MetaTPTrans improves the F1 score of state-of-the-art approaches
significantly by up to 2.40 percentage points for code summarization, a
language-agnostic task; and the prediction accuracy of Top-1 (Top-5) by up to
7.32 (13.15) percentage points for code completion, a language-specific task.Comment: Technical repor