180 research outputs found

    Cost-sensitive ensemble learning: a unifying framework

    Get PDF
    Over the years, a plethora of cost-sensitive methods have been proposed for learning on data when different types of misclassification errors incur different costs. Our contribution is a unifying framework that provides a comprehensive and insightful overview on cost-sensitive ensemble methods, pinpointing their differences and similarities via a fine-grained categorization. Our framework contains natural extensions and generalisations of ideas across methods, be it AdaBoost, Bagging or Random Forest, and as a result not only yields all methods known to date but also some not previously considered.publishedVersio

    Can Generative Adversarial Networks Help Us Fight Financial Fraud?

    Get PDF
    Transactional fraud datasets exhibit extreme class imbalance. Learners cannot make accurate generalizations without sufficient data. Researchers can account for imbalance at the data level, algorithmic level or both. This paper focuses on techniques at the data level. We evaluate the evidence of the optimal technique and potential enhancements. Global fraud losses totalled more than 80 % of the UK’s GDP in 2019. The improvement of preprocessing is inherently valuable in fighting these losses. Synthetic minority oversampling technique (SMOTE) and extensions of SMOTE are currently the most common preprocessing strategies. SMOTE oversamples the minority classes by randomly generating a point between a minority instance and its nearest neighbour. Recent papers adopt generative adversarial networks (GAN) for data synthetic creation. Since 2014 there had been several GAN extensions, from improved training mechanisms to frameworks specifically for tabular data. The primary aim of the research is to understand the benefits of GANs built specifically for tabular data on supervised classifiers performance. We determine if this framework will outperform traditional methods and more common GAN frameworks. Secondly, we propose a framework that allows individuals to test the impact of imbalance ratios on classifier performance. Finally, we investigate the use of clustering and determine if this information can help GANs create better synthetic information. We explore this in the context of commonly used supervised classifiers and ensemble methods

    A Review on Machine Learning and Deep Learning Techniques Applied to Liquid Biopsy

    Get PDF
    For more than a decade, machine learning (ML) and deep learning (DL) techniques have been a mainstay in the toolset for the analysis of large amounts of weakly correlated or high-dimensional data. As new technologies for detecting and measuring biochemical markers from bodily fluid samples (e.g., microfluidics and labs-on-a-chip) revolutionise the industry of diagnostics and precision medicine, the heterogeneity and complexity of the acquired data present a growing challenge to their interpretation and usage. In this chapter, we attempt to review the state of ML and DL fields as applied to the analysis of liquid biopsy data and summarise the available corpus of techniques and methodologies
    • …
    corecore