78,793 research outputs found

    Supervised machine learning algorithms for the estimation of the probability of default in corporate credit risk

    Get PDF
    This thesis investigates the application of non-linear supervised machine learning algorithms for estimating Probability of Default (PD) of corporate clients. To achieve this, the thesis is separated into three different experiments: 1. The first experiment investigates a wrapper feature selection method and its application on the support vector machines (SVMs) and logistic regression (LR). The logistic regression model is the most popular approach used for estimating PD in a rich default portfolio. However, other alternatives to PD estimation are available. SVMs method is compared to the logistic regression model using the proposed feature selection method. 2. The second experiment investigates the application of artificial neural networks (ANNs) for estimating PD of corporate clients. In particular ANNs are regularized and trained both with classical and Bayesian approach. Furthermore, different network architectures are explored and specifically the Bayesian estimation and regularization is compared to the classical estimation and regularization. 3. The third experiment investigates the k-Nearest Neighbours algorithm (KNNs). This algorithm is trained using both Bayesian and classical methods. KNNs could be efficiently applied to estimating PD. In addition, other supervised machine learning algorithms such as Decision trees (DTs), Linear discriminant analysis (LDA) and Naive Bayes (NB) were applied and their performance summarized and compared to that of the SVMs, ANNs, KNNs and logistic regression. The contribution of this thesis to science is to provide efficient and at the same time applicable methods for estimating PD of corporate clients. This thesis contributes to the existing literature in a number of ways. 1. First, this research proposes an innovative feature selection method for SVMs. 2. Second, this research proposes an innovative Bayesian estimation methods to regularize ANNs. 3. Third, this research proposes an innovative Bayesian approaches to the estimation of KNNs. Nonetheless, the objective of the research is to promote the use of the Bayesian non-linear supervised machine learning methods that are currently not heavily applied in the industry for PD estimation of corporate clients

    Efficient Learning and Feature Selection in High Dimensional Regression

    Get PDF
    We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This variational Bayesian least squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust black-box approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, the relevance vector machine, with VBLS at its core, offering significant computational and robustness advantages for this class of methods. The iterative nature of VBLS makes it most suitable for real-time incremental learning, which is crucial especially in the application domain of robotics, brain-machine interfaces, and neural prosthetics, where real-time learning of models for control is needed. We evaluate our algorithm on synthetic and neurophysiological data sets, as well as on standard regression and classification benchmark data sets, comparing it with other competitive statistical approaches and demonstrating its suitability as a drop-in replacement for other generalized linear regression techniques

    Bayesian Learning for Earthquake Engineering Applications and Structural Health Monitoring

    Get PDF
    Parallel to significant advances in sensor hardware, there have been recent developments of sophisticated methods for quantitative assessment of measured data that explicitly deal with all of the involved uncertainties, including inevitable measurement errors. The existence of these uncertainties often causes numerical instabilities in inverse problems that make them ill-conditioned. The Bayesian methodology is known to provide an efficient way to alleviate this illconditioning by incorporating the prior term for regularization of the inverse problem, and to provide probabilistic results which are meaningful for decision making. In this work, the Bayesian methodology is applied to inverse problems in earthquake engineering and especially to structural health monitoring. The proposed methodology of Bayesian learning using automatic relevance determination (ARD) prior, including its kernel version called the Relevance Vector Machine, is presented and applied to earthquake early warning, earthquake ground motion attenuation estimation, and structural health monitoring, using either a Bayesian classification or regression approach. The classification and regression are both performed in three phases: (1) Phase I (feature extraction phase): Determine which features from the data to use in a training dataset; (2) Phase II (training phase): Identify the unknown parameters defining a model by using a training dataset; and (3) Phase III (prediction phase): Predict the results based on the features from new data. This work focuses on the advantages of making probabilistic predictions obtained by Bayesian methods to deal with all uncertainties and the good characteristics of the proposed method in terms of computationally efficient training, and, especially, vi prediction that make it suitable for real-time operation. It is shown that sparseness (using only smaller number of basis function terms) is produced in the regression equations and classification separating boundary by using the ARD prior along with Bayesian model class selection to select the most probable (plausible) model class based on the data. This model class selection procedure automatically produces optimal regularization of the problem at hand, making it well-conditioned. Several applications of the proposed Bayesian learning methodology are presented. First, automatic near-source and far-source classification of incoming ground motion signals is treated and the Bayesian learning method is used to determine which ground motion features are optimal for this classification. Second, a probabilistic earthquake attenuation model for peak ground acceleration is identified using selected optimal features, especially taking a non-linearly involved parameter into consideration. It is shown that the Bayesian learning method can be utilized to estimate not only linear coefficients but also a non-linearly involved parameter to provide an estimate for an unknown parameter in the kernel basis functions for Relevance Vector Machine. Third, the proposed method is extended to a general case of regression problems with vector outputs and applied to structural health monitoring applications. It is concluded that the proposed vector output RVM shows promise for estimating damage locations and their severities from change of modal properties such as natural frequencies and mode shapes
    corecore