4 research outputs found

    A Framework for Classifying Web Attacks While Respecting ML Requirements

    No full text
    International audienceInjection and Cross Site Scripting attacks are among the ten critical security risks to web-based applications. It is difficult, to provide a complete signature for firewalls that detect such attacks. Therefore, there are several proposals based on Machine Learning (ML) methods capable of detecting various web attacks from evolutive, heterogeneous data at large scale, without the need for expert knowledge. Unfortunately, web attacks detection have been addressed only from a ML algorithm viewpoint, there is a lack of clarity regarding the quality and amount of the training data, the hyperparameters tuning and the evaluation method. Low and poor data quality may compromise the success of the most powerful ML methods. Additionally, it is easy to build a model that is perfectly adapted to the dataset but unable to generalize the new unseen data. This paper introduces F2MW, a framework for multi-classifying web attacks with respect to the ML requirements

    Client churn prediction with call log analysis

    Full text link
    © Springer International Publishing AG, part of Springer Nature 2018. Client churn prediction is a classic business problem of retaining customers. Recently, machine learning algorithms have been applied to predict client churn and have shown promising performance comparing to traditional methods. Despite of its success, existing machine learning approach mainly focus on structured data such as demographic and transactional data, while unstructured data, such as emails and phone calls, have been largely overlooked. In this work, we propose to improve existing churn prediction models by analysing customer characteristics and behaviours from unstructured data, particularly, audio calls. To be specific, we developed a text mining model combined with gradient boosting tree to predict client churn. We collected and conducted extensive experiments on 900 thousand audio calls from 200 thousand customers, and experimental results show that our approach can significantly improve the previous model by exploiting the additional unstructured data
    corecore