Search CORE

75 research outputs found

Basel II compliant credit risk modelling: model development for imbalanced credit scoring data sets, loss given default (LGD) and exposure at default (EAD)

Author: Brown Iain L.J.
Publication venue
Publication date: 01/01/2012
Field of study

The purpose of this thesis is to determine and to better inform industry practitioners to the most appropriate classification and regression techniques for modelling the three key credit risk components of the Basel II minimum capital requirement; probability of default (PD), loss given default (LGD), and exposure at default (EAD). The Basel II accord regulates risk and capital management requirements to ensure that a bank holds enough capital proportional to the exposed risk of its lending practices. Under the advanced internal ratings based (IRB) approach Basel II allows banks to develop their own empirical models based on historical data for each of PD, LGD and EAD.In this thesis, first the issue of imbalanced credit scoring data sets, a special case of PD modelling where the number of defaulting observations in a data set is much lower than the number of observations that do not default, is identified, and the suitability of various classification techniques are analysed and presented. As well as using traditional classification techniques this thesis also explores the suitability of gradient boosting, least square support vector machines and random forests as a form of classification. The second part of this thesis focuses on the prediction of LGD, which measures the economic loss, expressed as a percentage of the exposure, in case of default. In this thesis, various state-of-the-art regression techniques to model LGD are considered. In the final part of this thesis we investigate models for predicting the exposure at default (EAD). For off-balance-sheet items (for example credit cards) to calculate the EAD one requires the committed but unused loan amount times a credit conversion factor (CCF). Ordinary least squares (OLS), logistic and cumulative logistic regression models are analysed, as well as an OLS with Beta transformation model, with the main aim of finding the most robust and comprehensible model for the prediction of the CCF. Also a direct estimation of EAD, using an OLS model, will be analysed. All the models built and presented in this thesis have been applied to real-life data sets from major global banking institutions

Southampton (e-Prints Soton)

OpenGrey Repository

Examining the Transitional Impact of ICD-10 on Healthcare Fraud Detection

Author: Olson Tyler
Publication venue: DigitalCommons@CSB/SJU
Publication date: 01/04/2015
Field of study

On October 1st, 2015, the tenth revision of the International Classification of Diseases (ICD-10) will be mandatorily implemented in the United States. Although this medical classification system will allow healthcare professionals to code with greater accuracy, specificity, and detail, these codes will have a significant impact on the flavor of healthcare insurance claims. While the overall benefit of ICD-10 throughout the healthcare industry is unquestionable, some experts believe healthcare fraud detection and prevention could experience an initial drop in performance due to the implementation of ICD-10. We aim to quantitatively test the validity of this concern regarding an adverse transitional impact. This project explores how predictive fraud detection systems developed using ICD-9 claims data will initially react to the introduction of ICD-10. We have developed a basic fraud detection system incorporating both unsupervised and supervised learning methods in order to examine the potential fraudulence of both ICD-9 and ICD-10 claims in a predictive environment. Using this system, we are able to analyze the ability and performance of statistical methods trained using ICD-9 data to properly identify fraudulent ICD-10 claims. This research makes contributions to the domains of medical coding, healthcare informatics, and fraud detection

College of Saint Benedict and Saint John’s University: DigitalCommons@CSB/SJU

Realising advanced risk-based port state control inspection using data-driven Bayesian networks

Author: Akhtar
Alyami
Antão
Cain
Cariou
Chiu
Chow
Cooper
Eleye-Datubo
Eleye-Datubo
Fenton
Friedman
Goerlandt
Goerlandt
Hanninen
Huang
Hänninen
Hänninen
Hänninen
Jensen
Jensen
Jingbo Yin
John
Jones
Kara
Knapp
Knapp
Larra
Lehikoinen
Li
Li
Li
Min
Mkrtchyan
Montewka
Onisko
Pearl
Pedro
Pillay
Pristrom
Quinlan
Ren
Rijmen
Shen
Singh
Spirtes
Thomas
Tolo
Trucco
Valdez Banda
Viladrich-Grau
Vinnem
Wu
Wu
Yang
Yang
Yang
Yang
Yang
Yang
Yang
Zaili Yang
Zhang
Zhang
Zhang
Zhisen Yang
Publication venue: 'Elsevier BV'
Publication date
Field of study

In the past decades, maritime transportation not only contributes to economic prosperity, but also renders many threats to the industry, causing huge casualties and losses. As a result, various maritime safety measures have been developed, including Port State Control (PSC) inspections. In this paper, we propose a data-driven Bayesian Network (BN) based approach to analyse risk factors influencing PSC inspections, and predict the probability of vessel detention. To do so, inspection data of bulk carriers in seven major European countries from 2005 to 2008 1 in Paris MoU is collected to identify the relevant risk factors. Meanwhile, the network structure is constructed via TAN learning and subsequently validated by sensitivity analysis. The results reveal two conclusions: first, the key risk factors influencing PSC inspections include number of deficiencies, type of inspection, Recognised Organisation (RO) and vessel age. Second, the model exploits a novel way to predict the detention probabilities under different situations, which effectively help port authorities to rationalise their inspection regulations as well as allocation of the resources. Further effort will be made to conduct contrastive analysis between ‘Pre-NIR’ period and ‘Post-NIR’ period to test the impact of NIR started in 2008. © 2018 Elsevier Lt

LJMU Research Online (Liverpool John Moores University)

Crossref

Profiling patterns of interhelical associations in membrane proteins.

Author: Gorka Lasso Cabrera
Publication venue
Publication date: 01/01/2007
Field of study

A novel set of methods has been developed to characterize polytopic membrane proteins at the topological, organellar and functional level, in order to reduce the existing functional gap in the membrane proteome. Firstly, a novel clustering tool was implemented, named PROCLASS, to facilitate the manual curation of large sets of proteins, in readiness for feature extraction. TMLOOP and TMLOOP writer were implemented to refine current topological models by predicting membrane dipping loops. TMLOOP applies weighted predictive rules in a collective motif method, to overcome the inherent limitations of single motif methods. The approach achieved 92.4% accuracy in sensitivity and 100% reliability in specificity and 1,392 topological models described in the Swiss-Prot database were refined. The subcellular location (TMLOCATE) and molecular function (TMFUN) prediction methods rely on the TMDEPTH feature extraction method along data mining techniques. TMDEPTH uses refined topological models and amino acid sequences to calculate pairs of residues located at a similar depth in the membrane. Evaluation of TMLOCATE showed a normalized accuracy of 75% in discriminating between proteins belonging to the main organelles. At a sequence similarity threshold of 40%, TMFLTN predicted main functional classes with a sensitivity of 64.1-71.4%) and 70% of the olfactory GPCRs were correctly predicted. At a sequence similarity threshold of 90%, main functional classes were predicted with a sensitivity of 75.6-92.8%) and class A GPCRs were sub-classified with a sensitivity of 84.5%>-92.9%. These results reflect a direct association between the spatial arrangement of residues in the transmembrane regions and the capacity for polytopic membrane proteins to carry out their functions. The developed methods have for the first time categorically shown that the transmembrane regions hold essential information associated with a wide range of functional properties such as filtering and gating processes, subcellular location and molecular function

Cronfa at Swansea University

A review of clustering techniques and developments

Author: Bharill N
Ding W
Er MJ
Gupta A
Lin CT
Patel OP
Prasad M
Saxena A
Tiwari A
Publication venue: 'Elsevier BV'
Publication date: 06/12/2017
Field of study

© 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted

OPUS - University of Technology Sydney

On the Use of Speech and Face Information for Identity Verification

Author: Paliwal Kuldip K.
Sanderson Conrad
Publication venue: IDIAP
Publication date: 10/03/2006
Field of study

{T}his report first provides a review of important concepts in the field of information fusion, followed by a review of important milestones in audio-visual person identification and verification. {S}everal recent adaptive and non-adaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the non-adaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions. {T}his report is an extended and revised version of {IDIAP-RR} 02-33

Infoscience - École polytechnique fédérale de Lausanne

Identity Verification Using Speech and Face Information

Author: Abdeljaoued
Adjoudani
Alexandre
Altiçay
Altiçay
Atkins
Barniv
Ben-Yacoub
Bengio
Bengio
Bolle
Brunelli
Brunelli
Burges
Cardinaux
Caulcott
Chen
Chibelushi
Conrad Sanderson
Dempster
Dieckmann
Doddington
Duda
Dugelay
Furui
Furui
Gauvain
Genoud
Gonzales
Haigh
Hall
Ho
Hong
Iyengar
Jain
Jankowski
Joachims
Jourlin
Jourlin
Kittler
Kittler
Kuldip K. Paliwal
Luo
Nefian
Nelder
Ortega-Garcia
Pau
Picone
Poh
Potamianos
Press
Rabiner
Radová
Reynolds
Reynolds
Ross
Sanderson
Sanderson
Sanderson
Silsbee
Soong
Swokowski
Tenney
Tenney
Thong
Turk
Vapnik
Varshney
Wark
Wark
Wark
Wayman
Wildermoth
Woodward
Publication venue: 'Elsevier BV'
Publication date: 01/09/2005
Field of study

This article first provides an review of important concepts in the field of information fusion, followed by a review of important milestones in audio–visual person identification and verification. Several recent adaptive and nonadaptive techniques for reaching the verification decision (i.e., to accept or reject the claimant), based on speech and face information, are then evaluated in clean and noisy audio conditions on a common database; it is shown that in clean conditions most of the nonadaptive approaches provide similar performance and in noisy conditions most exhibit a severe deterioration in performance; it is also shown that current adaptive approaches are either inadequate or utilize restrictive assumptions. A new category of classifiers is then introduced, where the decision boundary is fixed but constructed to take into account how the distributions of opinions are likely to change due to noisy conditions; compared to a previously proposed adaptive approach, the proposed classifiers do not make a direct assumption about the type of noise that causes the mismatch between training and testing conditions

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

University of Queensland eSpace

Context-Based classification of objects in topographic data

Author: Mulhare Leo
Publication venue
Publication date: 01/10/2007
Field of study

Large-scale topographic databases model real world features as vector data objects. These can be point, line or area features. Each of these map objects is assigned to a descriptive class; for example, an area feature might be classed as a building, a garden or a road. Topographic data is subject to continual updates from cartographic surveys and ongoing quality improvement. One of the most important aspects of this is assignment and verification of class descriptions to each area feature. These attributes can be added manually, but, due to the vast volume of data involved, automated techniques are desirable to classify these polygons. Analogy is a key thought process that underpins learning and has been the subject of much research in the field of artificial intelligence (AI). An analogy identifies structural similarity between a well-known source domain and a less familiar target domain. In many cases, information present in the source can then be mapped to the target, yielding a better understanding of the latter. The solution of geometric analogy problems has been a fruitful area of AI research. We observe that there is a correlation between objects in geometric analogy problem domains and map features in topographic data. We describe two topographic area feature classification tools that use descriptions of neighbouring features to identify analogies between polygons: content vector matching (CVM) and context structure matching (CSM). CVM and CSM classify an area feature by matching its neighbourhood context against those of analogous polygons whose class is known. Both classifiers were implemented and then tested on high quality topographic polygon data supplied by Ordnance Survey (Great Britain). Area features were found to exhibit a high degree of variation in their neighbourhoods. CVM correctly classified 85.38% of the 79.03% of features it attempted to classify. The accuracy for CSM was 85.96% of the 62.96% of features it tried to identify. Thus, CVM can classify 25.53% more features than CSM, but is slightly less accurate. Both techniques excelled at identifying the feature classes that predominate in suburban data. Our structure-based classification approach may also benefit other types of spatial data, such as topographic line data, small-scale topographic data, raster data, architectural plans and circuit diagrams

MURAL - Maynooth University Research Archive Library

Classifier Ensemble Feature Selection for Automatic Fault Diagnosis

Author: BOLDT F. A.
Publication venue: Doutorado em Ciência da Computação
Publication date: 14/07/2017
Field of study

"An efficient ensemble feature selection scheme applied for fault diagnosis is proposed, based on three hypothesis: a. A fault diagnosis system does not need to be restricted to a single feature extraction model, on the contrary, it should use as many feature models as possible, since the extracted features are potentially discriminative and the feature pooling is subsequently reduced with feature selection; b. The feature selection process can be accelerated, without loss of classification performance, combining feature selection methods, in a way that faster and weaker methods reduce the number of potentially non-discriminative features, sending to slower and stronger methods a filtered smaller feature set; c. The optimal feature set for a multi-class problem might be different for each pair of classes. Therefore, the feature selection should be done using an one versus one scheme, even when multi-class classifiers are used. However, since the number of classifiers grows exponentially to the number of the classes, expensive techniques like Error-Correcting Output Codes (ECOC) might have a prohibitive computational cost for large datasets. Thus, a fast one versus one approach must be used to alleviate such a computational demand. These three hypothesis are corroborated by experiments. The main hypothesis of this work is that using these three approaches together is possible to improve significantly the classification performance of a classifier to identify conditions in industrial processes. Experiments have shown such an improvement for the 1-NN classifier in industrial processes used as case study.

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da Universidade Federal do Espirito Santo