The growing popularity of Deep Neural Networks, which often require
computationally expensive training and access to a vast amount of data, calls
for accurate authorship verification methods to deter unlawful dissemination of
the models and identify the source of the leak. In DNN watermarking the owner
may have access to the full network (white-box) or only be able to extract
information from its output to queries (black-box), but a watermarked model may
include both approaches in order to gather sufficient evidence to then gain
access to the network. Although there has been limited research in white-box
watermarking that considers traitor tracing, this problem is yet to be explored
in the black-box scenario. In this paper, we propose a black-and-white-box
watermarking method that opens the door to collusion-resistant traitor tracing
in black-box, exploiting the properties of Tardos codes, and making it possible
to identify the source of the leak before access to the model is granted. While
experimental results show that the method can successfully identify traitors,
even when further attacks have been performed, we also discuss its limitations
and open problems for traitor tracing in black-box.Comment: This work has been submitted to the IEEE International Workshop on
Information Forensics and Security (WIFS) 2023 for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl