2 research outputs found
Detection of Rare Events: Cluster Based Preprocessing of the Training Set: The Case on Complaints for Invoice Time Series
Detection of rare events is a major problem when dealing with unbalanced data. In the application of machine learning tools, data is split into training and test samples and preprocessing is applied to the training set, with the aim of obtaining a more balanced sample. In this paper we discuss preprocessing methods applied to heterogenous data clustered with respect to expected anomaly types. We propose a method for deciding on oversampling and under-sampling from each cluster, based on the variability of the items in each cluster, using Principal Component Analysis. The method is applied to the problem of detecting anomalies in a time series invoices, with an average rate of complaints of orders 10-4.
Detection of Expenditure Trends in the Telecommunication Sector
In the telecommunication sector, particularly in the cellular phone service area, customer expenditures have been in the areas of voice, short messages, and internet usage, leading to a pattern of more or less regular monthly bills. Recently, telecommunication companies started to associate retail stores to their billed commercial activities, resulting in unusual variations in the monthly payment sequences of their customers. In the present work we propose a method for detecting retail expenditure in monthly bills. We then code the information of the discretized version into a binary hierarchical tree and we classify them as positive or negative with respect to complaint potential