9,773 research outputs found
Neural Motifs: Scene Graph Parsing with Global Context
We investigate the problem of producing structured graph representations of
visual scenes. Our work analyzes the role of motifs: regularly appearing
substructures in scene graphs. We present new quantitative insights on such
repeated structures in the Visual Genome dataset. Our analysis shows that
object labels are highly predictive of relation labels but not vice-versa. We
also find that there are recurring patterns even in larger subgraphs: more than
50% of graphs contain motifs involving at least two relations. Our analysis
motivates a new baseline: given object detections, predict the most frequent
relation between object pairs with the given labels, as seen in the training
set. This baseline improves on the previous state-of-the-art by an average of
3.6% relative improvement across evaluation settings. We then introduce Stacked
Motif Networks, a new architecture designed to capture higher order motifs in
scene graphs that further improves over our strong baseline by an average 7.1%
relative gain. Our code is available at github.com/rowanz/neural-motifs.Comment: CVPR 2018 camera read
Delving into Commit-Issue Correlation to Enhance Commit Message Generation Models
Commit message generation (CMG) is a challenging task in automated software
engineering that aims to generate natural language descriptions of code changes
for commits. Previous methods all start from the modified code snippets,
outputting commit messages through template-based, retrieval-based, or
learning-based models. While these methods can summarize what is modified from
the perspective of code, they struggle to provide reasons for the commit. The
correlation between commits and issues that could be a critical factor for
generating rational commit messages is still unexplored.
In this work, we delve into the correlation between commits and issues from
the perspective of dataset and methodology. We construct the first dataset
anchored on combining correlated commits and issues. The dataset consists of an
unlabeled commit-issue parallel part and a labeled part in which each example
is provided with human-annotated rational information in the issue.
Furthermore, we propose \tool (\underline{Ex}traction, \underline{Gro}unding,
\underline{Fi}ne-tuning), a novel paradigm that can introduce the correlation
between commits and issues into the training phase of models. To evaluate
whether it is effective, we perform comprehensive experiments with various
state-of-the-art CMG models. The results show that compared with the original
models, the performance of \tool-enhanced models is significantly improved.Comment: ASE2023 accepted pape
A study on the impact of pre-trained model on Just-In-Time defect prediction
Previous researchers conducting Just-In-Time (JIT) defect prediction tasks
have primarily focused on the performance of individual pre-trained models,
without exploring the relationship between different pre-trained models as
backbones. In this study, we build six models: RoBERTaJIT, CodeBERTJIT,
BARTJIT, PLBARTJIT, GPT2JIT, and CodeGPTJIT, each with a distinct pre-trained
model as its backbone. We systematically explore the differences and
connections between these models. Specifically, we investigate the performance
of the models when using Commit code and Commit message as inputs, as well as
the relationship between training efficiency and model distribution among these
six models. Additionally, we conduct an ablation experiment to explore the
sensitivity of each model to inputs. Furthermore, we investigate how the models
perform in zero-shot and few-shot scenarios. Our findings indicate that each
model based on different backbones shows improvements, and when the backbone's
pre-training model is similar, the training resources that need to be consumed
are much more closer. We also observe that Commit code plays a significant role
in defect detection, and different pre-trained models demonstrate better defect
detection ability with a balanced dataset under few-shot scenarios. These
results provide new insights for optimizing JIT defect prediction tasks using
pre-trained models and highlight the factors that require more attention when
constructing such models. Additionally, CodeGPTJIT and GPT2JIT achieved better
performance than DeepJIT and CC2Vec on the two datasets respectively under 2000
training samples. These findings emphasize the effectiveness of
transformer-based pre-trained models in JIT defect prediction tasks, especially
in scenarios with limited training data
Learning to Represent Patches
Patch representation is crucial in automating various software engineering
tasks, like determining patch accuracy or summarizing code changes. While
recent research has employed deep learning for patch representation, focusing
on token sequences or Abstract Syntax Trees (ASTs), they often miss the
change's semantic intent and the context of modified lines. To bridge this gap,
we introduce a novel method, Patcherizer. It delves into the intentions of
context and structure, merging the surrounding code context with two innovative
representations. These capture the intention in code changes and the intention
in AST structural modifications pre and post-patch. This holistic
representation aptly captures a patch's underlying intentions. Patcherizer
employs graph convolutional neural networks for structural intention graph
representation and transformers for intention sequence representation. We
evaluated Patcherizer's embeddings' versatility in three areas: (1) Patch
description generation, (2) Patch accuracy prediction, and (3) Patch intention
identification. Our experiments demonstrate the representation's efficacy
across all tasks, outperforming state-of-the-art methods. For example, in patch
description generation, Patcherizer excels, showing an average boost of 19.39%
in BLEU, 8.71% in ROUGE-L, and 34.03% in METEOR scores
A Full-fledged Commit Message Quality Checker Based on Machine Learning
Commit messages (CMs) are an essential part of version control. By providing
important context in regard to what has changed and why, they strongly support
software maintenance and evolution. But writing good CMs is difficult and often
neglected by developers. So far, there is no tool suitable for practice that
automatically assesses how well a CM is written, including its meaning and
context. Since this task is challenging, we ask the research question: how well
can the CM quality, including semantics and context, be measured with machine
learning methods? By considering all rules from the most popular CM quality
guideline, creating datasets for those rules, and training and evaluating
state-of-the-art machine learning models to check those rules, we can answer
the research question with: sufficiently well for practice, with the lowest
F score of 82.9\%, for the most challenging task. We develop a full-fledged
open-source framework that checks all these CM quality rules. It is useful for
research, e.g., automatic CM generation, but most importantly for software
practitioners to raise the quality of CMs and thus the maintainability and
evolution speed of their software.Comment: published at COMPSAC'2
Organization and Usage of Learning Objects within Personal Computers
Research report of the ProLearn Network of Excellence (IST 507310), Deliverable 7.6To promote the integration of Desktop related Knowledge Management and Technology Enhanced Learning this deliverable aims at increasing the awareness of Desktop research within the Professional Learning community and at familiarizing the e-Learning researchers with the state-of-the-art in the relevant areas of Personal Information Management (PIM), as well as with the currently on-going activities and some of the regular PIM publication venues
- …