4 research outputs found

    L2P: An Algorithm for Estimating Heavy-tailed Outcomes

    Full text link
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.Comment: 9 pages, 6 figures, 2 tables Nature of changes from previous version: 1. Added complexity analysis in Section 2.2 2. Datasets change 3. Added LambdaMART in the baseline methods, also a brief discussion on why LambdaMart failed in our problem. 4. Figure update

    Robust regression with asymmetric heavy-tail noise distributions

    No full text
    In the presence of a heavy-tail noise distribution, regression becomes much more di cult. Traditional robust regression methods assume that the noise distribution is symmetric and they downweight the in uence of so-called outliers. When the noise distribution is asymmetric these methods yield strongly biased regression estimators. Motivated by data-mining problems for the insurance industry, we propose in this paper a new approach torobust regression that is tailored to deal with the case where the noise distribution is asymmetric. The main idea is to learn most of the parameters of the model using conditional quantile estimators (which are biased but robust estimators of the regression), and to learn a few remaining parameters to combine and correct these estimators, to minimize the average squared error. Theoretical analysis and experiments show the clear advantages of the approach. Results are on arti cial data as well as real insurance data, using both linear and neural-network predictors. ∗ This work has been done while Takafumi Kanamori was at Université de Montréal, DIRO
    corecore