Non-technical losses (NTL) such as electricity theft cause significant harm
to our economies, as in some countries they may range up to 40% of the total
electricity distributed. Detecting NTLs requires costly on-site inspections.
Accurate prediction of NTLs for customers using machine learning is therefore
crucial. To date, related research largely ignore that the two classes of
regular and non-regular customers are highly imbalanced, that NTL proportions
may change and mostly consider small data sets, often not allowing to deploy
the results in production. In this paper, we present a comprehensive approach
to assess three NTL detection models for different NTL proportions in large
real world data sets of 100Ks of customers: Boolean rules, fuzzy logic and
Support Vector Machine. This work has resulted in appreciable results that are
about to be deployed in a leading industry solution. We believe that the
considerations and observations made in this contribution are necessary for
future smart meter research in order to report their effectiveness on
imbalanced and large real world data sets.Comment: Proceedings of the Seventh IEEE Conference on Innovative Smart Grid
Technologies (ISGT 2016