Nuclear fuel cycle facility declarations on nuclear material inventories and transfers are independently verified by the IAEA. These verification activities usually rely on a sampling plan that is designed to achieve a specified probability to detect falsification of operator reports. Currently, the IAEA’s sampling plans assume item-by-item tests in which the difference between the reported and the measured value of each item selected for verification is compared to a threshold. If a difference exceeds this threshold, then an “alarm” occurs, and the cause for the difference is further investigated. In the present paper we analyse sampling plans in which in addition to the usual item-by-item tests, a stratum difference statistic of the verified items is applied as a test statistic. The reason for considering the stratum difference statistic in addition to the item-by-item tests is that it is “better” at detecting bias defect falsifications than the item-by-item tests. Therefore, we investigate the effectiveness in terms of the achieved detection probability of sampling plans in which both tests are applied and analyse whether sample sizes could be reduced while still achieving the required detection probability