229,462 research outputs found
The interaction of sampling ratio and modelling method in prediction of binary target with rare target class
In many practical predictive data mining problems with a binary target, one of the target
classes is rare. In such a situation it is common practice to decrease the ratio of common to
rare class cases in the training set by under-sampling the common class. The relationship
between the ratio of common to rare class cases in the training set and model performance
was investigated empirically on three artificial and three real-world data sets. The results
indicated that a flexible modelling method without regularisation benefits in both mean and
variance of performance from a larger ratio when evaluated on a criterion sensitive to
overfitting, and benefits in mean but not variance of performance when evaluated on a
criterion less sensitive to overfitting. For an inflexible modelling method and a flexible
method with regularisation, the effects of a larger ratio were less consistent. In no
circumstances, however, was a larger ratio found to be detrimental to model performance,
however measured
Black Lung: Old Disease, New Lessons
Previous to 2016, cases of progressive massive fibrosis secondary to mining exposure had dwindled and were considered nearly eradicated. However, over 40 new cases were recently discovered in Kentucky, indicating a resurgence of a previously rare disease. We herein report a case of a 44-year-old male underground coal miner from Appalachia with fifteen years coal mining dust exposure who presented with four years of productive cough, dyspnea upon exertion and wheezing for an occupational pneumoconiosis evaluation. Since 2016, he suffered a precipitous decline in lung function consistent with restrictive lung disease and concomitant progression from simple coal workers’ pneumoconiosis to progressive massive fibrosis. In particular, his chest x-ray shows classic findings of “angel wings” caused by large fibrotic masses in both lungs. This case, as well as the several other new cases, call attention to the resurgence of PMF and requires examination of the factors contributing to its recent rebound
Data mining for detecting Bitcoin Ponzi schemes
Soon after its introduction in 2009, Bitcoin has been adopted by
cyber-criminals, which rely on its pseudonymity to implement virtually
untraceable scams. One of the typical scams that operate on Bitcoin are the
so-called Ponzi schemes. These are fraudulent investments which repay users
with the funds invested by new users that join the scheme, and implode when it
is no longer possible to find new investments. Despite being illegal in many
countries, Ponzi schemes are now proliferating on Bitcoin, and they keep
alluring new victims, who are plundered of millions of dollars. We apply data
mining techniques to detect Bitcoin addresses related to Ponzi schemes. Our
starting point is a dataset of features of real-world Ponzi schemes, that we
construct by analysing, on the Bitcoin blockchain, the transactions used to
perform the scams. We use this dataset to experiment with various machine
learning algorithms, and we assess their effectiveness through standard
validation protocols and performance metrics. The best of the classifiers we
have experimented can identify most of the Ponzi schemes in the dataset, with a
low number of false positives
- …