2 research outputs found
Multi-granular Software Annotation using File-level Weak Labelling
One of the most time-consuming tasks for developers is the comprehension of
new code bases. An effective approach to aid this process is to label source
code files with meaningful annotations, which can help developers understand
the content and functionality of a code base quicker. However, most existing
solutions for code annotation focus on project-level classification: manually
labelling individual files is time-consuming, error-prone and hard to scale.
The work presented in this paper aims to automate the annotation of files by
leveraging project-level labels; and using the file-level annotations to
annotate items at larger levels of granularity, for example, packages and a
whole project.
We propose a novel approach to annotate source code files using a weak
labelling approach and a subsequent hierarchical aggregation. We investigate
whether this approach is effective in achieving multi-granular annotations of
software projects, which can aid developers in understanding the content and
functionalities of a code base more quickly.
Our evaluation uses a combination of human assessment and automated metrics
to evaluate the annotations' quality. Our approach correctly annotated 50% of
files and more than 50\% of packages. Moreover, the information captured at the
file-level allowed us to identify, on average, three new relevant labels for
any given project.
We can conclude that the proposed approach is a convenient and promising way
to generate noisy (not precise) annotations for files. Furthermore,
hierarchical aggregation effectively preserves the information captured at
file-level, and it can be propagated to packages and the overall project
itself.Comment: Accepted at the Journal of Empirical Software Engineerin