8 research outputs found

    Measuring the Driving Forces of Predictive Performance: Application to Credit Scoring

    Full text link
    In credit scoring, machine learning models are known to outperform standard parametric models. As they condition access to credit, banking supervisors and internal model validation teams need to monitor their predictive performance and to identify the features with the highest impact on performance. To facilitate this, we introduce the XPER methodology to decompose a performance metric (e.g., AUC, R2R^2) into specific contributions associated with the various features of a classification or regression model. XPER is theoretically grounded on Shapley values and is both model-agnostic and performance metric-agnostic. Furthermore, it can be implemented either at the model level or at the individual level. Using a novel dataset of car loans, we decompose the AUC of a machine-learning model trained to forecast the default probability of loan applicants. We show that a small number of features can explain a surprisingly large part of the model performance. Furthermore, we find that the features that contribute the most to the predictive performance of the model may not be the ones that contribute the most to individual forecasts (SHAP). We also show how XPER can be used to deal with heterogeneity issues and significantly boost out-of-sample performance

    In vivo emergence of HIV-1 highly sensitive to neutralizing antibodies.

    Get PDF
    BACKGROUND: The rapid and continual viral escape from neutralizing antibodies is well documented in HIV-1 infection. Here we report in vivo emergence of viruses with heightened sensitivity to neutralizing antibodies, sometimes paralleling the development of neutralization escape. METHODOLOGY/PRINCIPAL FINDINGS: Sequential viral envs were amplified from seven HIV-1 infected men monitored from seroconversion up to 5 years after infection. Env-recombinant infectious molecular clones were generated and tested for coreceptor use, macrophage tropism and neutralization sensitivity to homologous and heterologous serum, soluble CD4 and monoclonal antibodies IgG1b12, 2G12 and 17b. We found that HIV-1 evolves sensitivity to contemporaneous neutralizing antibodies during infection. Neutralization sensitive viruses grow out even when potent autologous neutralizing antibodies are present in patient serum. Increased sensitivity to neutralization was associated with susceptibility of the CD4 binding site or epitopes induced after CD4 binding, and mediated by complex envelope determinants including V3 and V4 residues. The development of neutralization sensitive viruses occurred without clinical progression, coreceptor switch or change in tropism for primary macrophages. CONCLUSIONS: We propose that an interplay of selective forces for greater virus replication efficiency without the need to resist neutralizing antibodies in a compartment protected from immune surveillance may explain the temporal course described here for the in vivo emergence of HIV-1 isolates with high sensitivity to neutralizing antibodies

    Explainable Performance

    No full text
    We introduce the XPER (eXplainable PERformance) methodology to measure the specific contribution of the input features to the predictive or economic performance of a model. Our methodology offers several advantages. First, it is both model-agnostic and performance metric-agnostic. Second, XPER is theoretically founded as it is based on Shapley values. Third, the interpretation of the benchmark, which is inherent in any Shapley value decomposition, is meaningful in our context. Fourth, XPER is not plagued by model specification error, as it does not require re-estimating the model. Fifth, it can be implemented either at the model level or at the individual level. In an application based on auto loans, we find that performance can be explained by a surprisingly small number of features, XPER decompositions are rather stable across metrics, yet some feature contributions switch sign across metrics. Our analysis also shows that explaining model forecasts and model performance are two distinct tasks

    Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds

    No full text
    In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we pro-pose to obtain the best of both worlds by introducing a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with pairs of predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method. JEL Classification: G10 C25, C5

    Machine Learning for Credit Scoring: Improving Logistic Regression with Non-Linear Decision-Tree Effects

    No full text
    In the context of credit scoring, ensemble methods based on decision trees, such as the random forest method, provide better classification performance than standard logistic regression models. However, logistic regression remains the benchmark in the credit risk industry mainly because the lack of interpretability of ensemble methods is incompatible with the requirements of financial regulators. In this paper, we propose a high-performance and interpretable credit scoring method called penalised logistic tree regression (PLTR), which uses information from decision trees to improve the performance of logistic regression. Formally, rules extracted from various short-depth decision trees built with original predictive variables are used as predictors in a penalised logistic regression model. PLTR allows us to capture non-linear effects that can arise in credit scoring data while preserving the intrinsic interpretability of the logistic regression model. Monte Carlo simulations and empirical applications using four real credit default datasets show that PLTR predicts credit risk significantly more accurately than logistic regression and compares competitively to the random forest method

    The evolution of HIV: Inferences using phylogenetics

    No full text

    Associative rings

    No full text
    corecore