229,462 research outputs found

    The interaction of sampling ratio and modelling method in prediction of binary target with rare target class

    Get PDF
    In many practical predictive data mining problems with a binary target, one of the target classes is rare. In such a situation it is common practice to decrease the ratio of common to rare class cases in the training set by under-sampling the common class. The relationship between the ratio of common to rare class cases in the training set and model performance was investigated empirically on three artificial and three real-world data sets. The results indicated that a flexible modelling method without regularisation benefits in both mean and variance of performance from a larger ratio when evaluated on a criterion sensitive to overfitting, and benefits in mean but not variance of performance when evaluated on a criterion less sensitive to overfitting. For an inflexible modelling method and a flexible method with regularisation, the effects of a larger ratio were less consistent. In no circumstances, however, was a larger ratio found to be detrimental to model performance, however measured

    Black Lung: Old Disease, New Lessons

    Get PDF
    Previous to 2016, cases of progressive massive fibrosis secondary to mining exposure had dwindled and were considered nearly eradicated. However, over 40 new cases were recently discovered in Kentucky, indicating a resurgence of a previously rare disease. We herein report a case of a 44-year-old male underground coal miner from Appalachia with fifteen years coal mining dust exposure who presented with four years of productive cough, dyspnea upon exertion and wheezing for an occupational pneumoconiosis evaluation. Since 2016, he suffered a precipitous decline in lung function consistent with restrictive lung disease and concomitant progression from simple coal workers’ pneumoconiosis to progressive massive fibrosis. In particular, his chest x-ray shows classic findings of “angel wings” caused by large fibrotic masses in both lungs. This case, as well as the several other new cases, call attention to the resurgence of PMF and requires examination of the factors contributing to its recent rebound

    Data mining for detecting Bitcoin Ponzi schemes

    Full text link
    Soon after its introduction in 2009, Bitcoin has been adopted by cyber-criminals, which rely on its pseudonymity to implement virtually untraceable scams. One of the typical scams that operate on Bitcoin are the so-called Ponzi schemes. These are fraudulent investments which repay users with the funds invested by new users that join the scheme, and implode when it is no longer possible to find new investments. Despite being illegal in many countries, Ponzi schemes are now proliferating on Bitcoin, and they keep alluring new victims, who are plundered of millions of dollars. We apply data mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our starting point is a dataset of features of real-world Ponzi schemes, that we construct by analysing, on the Bitcoin blockchain, the transactions used to perform the scams. We use this dataset to experiment with various machine learning algorithms, and we assess their effectiveness through standard validation protocols and performance metrics. The best of the classifiers we have experimented can identify most of the Ponzi schemes in the dataset, with a low number of false positives
    • …
    corecore