1,422 research outputs found

    Runtime Optimizations for Prediction with Tree-Based Models

    Full text link
    Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, given an already-trained model. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processor architectures. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures and significantly improve the speed of tree-based models over hard-coded if-else blocks. Our work contributes to the exploration of architecture-conscious runtime implementations of machine learning algorithms

    Exact boundaries in sequential testing for phase-type distributions

    Get PDF
    Consider Wald's sequential probability ratio test for deciding whether a sequence of independent and identically distributed observations comes from a specified phase-type distribution or from an exponentially tilted alternative distribution. Exact decision boundaries for given type-I and type-II errors are derived by establishing a link with ruin theory. Information on the mean sample size of the test can be retrieved as well. The approach relies on the use of matrix-valued scale functions associated with a certain one-sided Markov additive process. By suitable transformations, the results also apply to other types of distributions, including some distributions with regularly varying tails

    On nonparametric estimation of a reliability function

    Get PDF
    This article considers the properties of a nonparametric estimator developed for a reliability function which is used in many reliability problems. Properties such as asymptotic unbiasedness and consistency are proven for the estimator and using U-statistics, weak convergence of the estimator to a normal distribution is shown. Finally, numerical examples based on an extensive simulation study are presented to illustrate the theory and compare the estimator developed in this article with another based directly on the ratio of two empirical distributions studied in Zardasht and Asadi (2010)

    On flood risk pooling in Europe

    Get PDF
    In this paper, we review and discuss some challenges in insuring flood risk in Europe on the national level, including high correlation of damages. Making use of recent advances in extreme value theory, we, furthermore, model flood risk with heavy-tailed distributions and their truncated counterparts and apply the discussed techniques to an inflation- and building-value-adjusted annual data set of flood losses in Europe. The analysis leads to Value-at-Risk estimates for individual countries and for Europe as a whole, allowing to quantify the diversification potential for flood risk in Europe. Finally, we identify optimal risk pooling possibilities in case a joint insurance strategy on the European level cannot be realized and quantify the resulting inefficiency in terms of additional necessary solvency capital. Thus, the results also contribute to the ongoing discussion on how public risk transfer mechanisms can supplement missing private insurance coverage

    Runtime Optimizations for Tree-Based Machine Learning Models

    Get PDF
    Tree-based models have proven to be an effective solution for web ranking as well as other machine learning problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, specifically using gradient-boosted regression trees for learning to rank. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processors. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures. Experiments on synthetic data and on three standard learning-to-rank datasets show that our approach is significantly faster than standard implementations
    corecore