2 research outputs found

    Large scale anomaly detection in mixed numerical and categorical input spaces

    Get PDF
    © 2019. This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0/. This version of the article Eiras-Franco, C., Martínez-Rego, D., Guijarro-Berdiñas, B., Alonso-Betanzos, A., Bahamonde, A. (2019) ‘Large scale anomaly detection in mixed numerical and categorical input spaces’ has been accepted for publication in: Information Sciences, 487, pp. 115-127. The Version of Record is available online at https://doi.org/10.1016/j.ins.2019.03.013.[Abstract]: This work presents the ADMNC method, designed to tackle anomaly detection for large-scale problems with a mixture of categorical and numerical input variables. A flexible parametric probability measure is adjusted to input data, allowing low likelihood values to be tracked as anomalies. The main contribution of this method is that, to cope with the variable nature of the variables, we factorize the joint probability measure into two parts, namely, the marginal density of the continuous variables and the conditional probability of the categorical variables given the continuous part of the feature vector. The result is a model trained through a maximum likelihood objective function optimized with stochastic gradient descent that yields an effective and scalable algorithm. Compared with other well-known anomaly detection algorithms over several datasets, ADMNC is observed to both offer top level accuracy in datasets that are out of reach for the most effective existing methods and to scale up well to processing very large datasets. This makes it a powerful tool for solving a problem growing in popularity that currently lacks suitable scalable algorithms.This research has been financially supported in part by the Spanish Ministerio de EconomÍa y Competitividad (research projects TIN 2015-65069-C2, both 1-R and 2-R), by the Xunta de Galicia (Grants GRC2014/035 and ED431G/01) and the European Union Regional Development Funds.Xunta de Galicia; GRC2014/035Xunta de Galicia; ED431G/0
    corecore