Smart contracts are small programs on the blockchain that often handle
valuable assets. Vulnerabilities in smart contracts can be costly, as time has
shown over and over again. Countermeasures are high in demand and include best
practice recommendations as well as tools supporting development, program
verification, and post-deployment analysis. Many tools focus on detecting the
absence or presence of a subset of the known vulnerabilities, delivering
results of varying quality. Most comparative tool evaluations resort to
selecting a handful of tools and testing them against each other. In the best
case, the evaluation is based on a smallish ground truth. For Ethereum, there
are commendable efforts by several author groups to manually classify
contracts. However, a comprehensive ground truth is still lacking. In this
work, we construct a ground truth based on publicly available benchmark sets
for Ethereum smart contracts with manually checked ground truth data. We
develop a method to unify these sets. Additionally, we devise strategies for
matching entries that pertain to the same contract, such that we can determine
overlaps and disagreements between the sets and consolidate the disagreements.
Finally, we assess the quality of the included ground truth sets. Our work
reduces inconsistencies, redundancies, and incompleteness while increasing the
number of data points and heterogeneity