142 research outputs found

    Application of Bayesian Networks to Risk Assessment

    Get PDF
    Various approaches are used to estimate and predict risks. One of the most prevalent methods for risk assessment is the Cox's proportional hazard (CPH) model (Cox, 1972), a popular statistical technique used in risk estimation and survival analysis. The weaknesses of this approach are: (1) the underlying model can be only learned from data and is not readily amenable to refinement based on expert knowledge (2) the CPH model rests on several assumptions simplifying the interactions between the risk factors and the predicted outcome. While these assumptions are reasonable and the CPH model has been successfully used for decades, it is interesting to question them with a possible benefit in terms of model accuracy. This dissertation focuses on theoretical and practical aspects of risk assessment based on Bayesian networks (Pearl, 1988) as an alternative approach to the CPH model. The dissertation makes three contributions: (1) I propose a Bayesian network interpretation of the CPH (BN-Cox) model, a process of using existing CPH models as data sources for parameter estimation in Bayesian networks when original data are not available, and discuss methods for modeling such model computationally tractable (2) I empirically demonstrate in both context-sensitivity of the strength of influences of individual risk factors on the outcome variables in both Bayesian network model and the CPH model, and finally, (3) I propose and evaluate methods for enhancing the quality of Bayesian network parameters learned from small data sets, by means of priors

    Credit scoring: Discussion of methods and a case study

    Get PDF
    The scenario considered is that of a credit association, a bank or an- other nancial institution which, on the basis of information about a new potential customer and historical data on many other customers, has to decide whether or not to give that customer a certain loan. We discuss three popular techniques: logistic regression, discriminant analysis and neural networks. We shall argue strongly in favour of the logistic regression. Discriminant analysis can be used, and for reasons that can be explained mathematically it will often result in approximately the same conclusions as a logistic regression. But the statistical assumptions are not appropriate in most cases, and the results given are not as directly interpretable as those of logistic re- gression. Neural network techniques, in their simplest form, su er from the lack of statistical standard methods for veri cation of the model and tests for removal of covariates. This problem disappears to some extend when the neural networks are reformulated as proper statistical models, based on the type of functions that are considered in neural networks. But this results in a somewhat specialized class of non{linear regression models, which may be useful in situations where local peculiarities of the response function are in focus, but certainly not when the overall | usually monotone | e ect of many more or less confounded covariates is the issue. We discuss, within the logistic regression framework, the handling of phenomena such as time trends and corruption of the historical data due to shifts of policy, censor- ing and/or interventions in highrisk customers' economy. Finally, we illustrate and support the theoretical considerations by a case study concerning mortgage loans in a Danish credit associati

    The Pennsylvania reemployment bonus experiments : how a survival model helps in the analysis of the data

    Get PDF
    Survival models for life-time data and other time-to-event data are widely used in many fields, including medicine, the environmental sciences, engineering etc. They have also found recognition in the analysis of economic duration data. This paper provides a reanalysis of the Pennsylvania Reemployment Bonus Experiments, which were conducted in 1988-89 to examine the effect of different types of reemployment bonus offers on the unemployment spell. A Cox-proportional-hazards survival-model is fitted to the data and the results are compared to the results of a linear regression approach and to the results of a quantile regression approach. The Cox-proportional-hazards model provides for a remarkable goodness of fit and yields less effective treatment responses, therefore lower expectations concerning the overall implications of the Pennsylvania experiment. An influence analysis is proposed for obtaining qualitative information on the influence of the covariates at different quantiles. The results of the quantile regression and of the influence analysis show that both the linear regression and the Cox-model still impose stringent restrictions on the way covariates influence the duration distribution, however, due to its flexibility, the Cox-proportional hazards model is more appropriate for analysing the data

    Estimating multiple time-fixed treatment effects using a semi-Bayes semiparametric marginal structural Cox proportional hazards regression model

    Get PDF
    Marginal structural models for time-fixed treatments fit using inverse-probability weighted estimating equations are increasingly popular. Nonetheless, the resulting effect estimates are subject to finite-sample bias when data are sparse, as is typical for large-sample procedures. Here we propose a semi-Bayes estimation approach which penalizes or shrinks the estimated model parameters to improve finite-sample performance. This approach uses simple symmetric data-augmentation priors. Limited simulation experiments indicate that the proposed approach reduces finite-sample bias and improves confidence-interval coverage when the true values lie within the central “hill” of the prior distribution. We illustrate the approach with data from a nonexperimental study of HIV treatments

    Customer lifetime value : an integrated data mining approach

    Full text link
    Customer Lifetime Value (CLV) ---which is a measure of the profit generating potential, or value, of a customer---is increasingly being considered a touchstone for customer relationship management. As the guide and benchmark for Customer Relationship Management (CRM) applications, CLV analysis has received increasing attention from both the marketing practitioners and researchers from different domains. Furthermore, the central challenge in predicting CLV is the precise calculation of customer’s length of service (LOS). There are several statistical approaches for this problem and several researchers have used these approaches to perform survival analysis in different domains. However, classical survival analysis techniques like Kaplan-Meier approach which offers a fully non-parametric estimate ignores the covariates completely and assumes stationary of churn behavior along time, which makes it less practical. Further, segments of customers, whose lifetimes and covariate effects can vary widely, are not necessarily easy to detect. Like many other applications, data mining is emerging as a compelling analysis tool for the CLV application recently. Comparatively, data mining methods offer an interesting alternative with the fact that they are less limited than the conventional statistical approaches. Customer databases contain histories of vital events such as the acquisition and cancellation of products and services. The historical data is used to build predictive models for customer retention, cross-selling, and other database marketing endeavors. In this research project we discuss and investigate the possibility of combining these statistical approaches with data mining methods to improve the performance for the CLV problem in a real business context. Part of the research effort is placed on the precise prediction of LOS of the customers in concentration of a real world business. Using the conventional statistical approaches and data mining methods in tandem, we demonstrate how data mining tools can be apt complements of the classical statistical models ---resulting in a CLV prediction model that is both accurate and understandable. We also evaluate the proposed integrated method to extract interesting business domain knowledge within the scope of CLV problem. In particular, several data mining methods are discussed and evaluated according to their accuracy of prediction and interpretability of results. The research findings will lead us to a data mining method combined with survival analysis approaches as a robust tool for modeling CLV and for assisting management decision-making. A calling plan strategy is designed based on the predicted survival time and calculated CLV for the telecommunication industry. The calling plan strategy further investigates potential business knowledge assisted by the CLV calculated
    • …
    corecore