The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction

Cannavale, Alfonso; Iannone, Emanuele; Di Lillo, Gianluca; Palomba, Fabio; De Lucia, Andrea

conference paperother

oai:tore.tuhh.de:11420/58017

The ground truth effect: investigating SZZ variants in Just-in-Time vulnerability prediction

Authors: Alfonso Cannavale
Emanuele Iannone
Gianluca Di Lillo
Fabio Palomba
Andrea De Lucia
Publication date: 1 September 2026
Publisher: 'Springer Fachmedien Wiesbaden GmbH'

Abstract

Just-in-Time (JIT) vulnerability prediction is critical for proactively securing software, yet its effectiveness heavily relies on the quality of the ground truth used for training models. This ground truth is commonly established using variants of the SZZ algorithm to identify vulnerability-contributing commits (VCCs). However, the impact of choosing a specific SZZ variant on model performance remains largely unexplored. In this study, we systematically investigate the effect of eight SZZ variants on JIT vulnerability prediction across seven open-source Java projects. Our findings reveal that the choice of the SZZ variant is a non-trivial factor. Models trained with datasets labeled by variants like B-SZZ, V-SZZ, and VCC-SZZ achieve strong and stable predictive performance, with median MCC scores often exceeding 0.50. In contrast, variants such as L-SZZ and R-SZZ produce models that perform no better than random chance, with median MCC scores close to 0.0. This performance gap demonstrates that an inappropriate SZZ variant can invalidate prediction models, underscoring the necessity of a principled approach to defining ground truth

Similar works

Full text

TUHH Open Research (TORE) (Techn. Univ. Hamburg)

oai:tore.tuhh.de:11420/58017

Last time updated on 22/10/2025

This paper was published in TUHH Open Research (TORE) (Techn. Univ. Hamburg).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.