9,466 research outputs found

    Improving imbalanced classification using near-miss instances

    Get PDF
    The class imbalance is a major issue in classification, i.e., the sample size of a rare class (positive) is often a performance bottleneck. In real-world situations, however, “near-miss” positive instances, i.e., negative but nearly-positive instances, are sometimes plentiful. For example, natural disasters such as floods are rare, while there are relatively plentiful near-miss cases where actual floods did not occur but the water level approached the bank height. We show that even when the true positive cases are quite limited, such as in disaster forecasting, the accuracy can be improved by obtaining refined label-like side-information “positivity” (e.g., the water level of the river) to distinguish near-miss cases from other negatives. Conventional cost-sensitive classification cannot utilize such side-information, and the small size of the positive sample causes high estimation variance. Our approach is in line with learning using privileged information (LUPI), which exploits side-information for training without predicting the side-information itself. We theoretically prove that our method reduces the estimation variance, provided that near-miss positive instances are plentiful, in exchange for additional bias. Results of extensive experiments demonstrate that our method tends to outperform or compares favorably to existing approaches

    Malnourished and surviving in South Asia, better nourished and dying young in Africa: What can explain this puzzle?

    Get PDF
    This paper examines the factors explaining the very different relationship between anthropometric shortfall and child mortality in South Asia and Sub Saharan Africa. While in the former, very high rates of anthropometric shortfall coexist with comparatively lower child mortality rates, rates of anthropometric shortfall in Sub Saharan Africa are much lower, yet under five mortality is much higher than in South Asia. This puzzle is examined using a panel data set of undernutrition, mortality, and their correlates. The analysis suggests that the unusually high rates of anthropometric shortfall in South Asia are partially due to the use of a US¡based reference standard which appears to generate misleading international comparisons of undernutrition. The very high rates of under five mortality in Africa seem to be mostly due to very high fertility, high and rising HIV prevalence, and a possible multiplicative interaction of risk factors
    corecore