6,103 research outputs found
Support Vector Machines for Credit Scoring and discovery of significant features
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1
Dissimilarity-based Ensembles for Multiple Instance Learning
In multiple instance learning, objects are sets (bags) of feature vectors
(instances) rather than individual feature vectors. In this paper we address
the problem of how these bags can best be represented. Two standard approaches
are to use (dis)similarities between bags and prototype bags, or between bags
and prototype instances. The first approach results in a relatively
low-dimensional representation determined by the number of training bags, while
the second approach results in a relatively high-dimensional representation,
determined by the total number of instances in the training set. In this paper
a third, intermediate approach is proposed, which links the two approaches and
combines their strengths. Our classifier is inspired by a random subspace
ensemble, and considers subspaces of the dissimilarity space, defined by
subsets of instances, as prototypes. We provide guidelines for using such an
ensemble, and show state-of-the-art performances on a range of multiple
instance learning problems.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems, Special Issue on Learning in Non-(geo)metric Space
Towards Visually Explaining Variational Autoencoders
Recent advances in Convolutional Neural Network (CNN) model interpretability
have led to impressive progress in visualizing and understanding model
predictions. In particular, gradient-based visual attention methods have driven
much recent effort in using visual attention maps as a means for visual
explanations. A key problem, however, is these methods are designed for
classification and categorization tasks, and their extension to explaining
generative models, e.g. variational autoencoders (VAE) is not trivial. In this
work, we take a step towards bridging this crucial gap, proposing the first
technique to visually explain VAEs by means of gradient-based attention. We
present methods to generate visual attention from the learned latent space, and
also demonstrate such attention explanations serve more than just explaining
VAE predictions. We show how these attention maps can be used to localize
anomalies in images, demonstrating state-of-the-art performance on the MVTec-AD
dataset. We also show how they can be infused into model training, helping
bootstrap the VAE into learning improved latent space disentanglement,
demonstrated on the Dsprites dataset
Convex Hull-Based Multi-objective Genetic Programming for Maximizing ROC Performance
ROC is usually used to analyze the performance of classifiers in data mining.
ROC convex hull (ROCCH) is the least convex major-ant (LCM) of the empirical
ROC curve, and covers potential optima for the given set of classifiers.
Generally, ROC performance maximization could be considered to maximize the
ROCCH, which also means to maximize the true positive rate (tpr) and minimize
the false positive rate (fpr) for each classifier in the ROC space. However,
tpr and fpr are conflicting with each other in the ROCCH optimization process.
Though ROCCH maximization problem seems like a multi-objective optimization
problem (MOP), the special characters make it different from traditional MOP.
In this work, we will discuss the difference between them and propose convex
hull-based multi-objective genetic programming (CH-MOGP) to solve ROCCH
maximization problems. Convex hull-based sort is an indicator based selection
scheme that aims to maximize the area under convex hull, which serves as a
unary indicator for the performance of a set of points. A selection procedure
is described that can be efficiently implemented and follows similar design
principles than classical hyper-volume based optimization algorithms. It is
hypothesized that by using a tailored indicator-based selection scheme CH-MOGP
gets more efficient for ROC convex hull approximation than algorithms which
compute all Pareto optimal points. To test our hypothesis we compare the new
CH-MOGP to MOGP with classical selection schemes, including NSGA-II, MOEA/D)
and SMS-EMOA. Meanwhile, CH-MOGP is also compared with traditional machine
learning algorithms such as C4.5, Naive Bayes and Prie. Experimental results
based on 22 well-known UCI data sets show that CH-MOGP outperforms
significantly traditional EMOAs
Deep Multi-view Learning to Rank
We study the problem of learning to rank from multiple information sources.
Though multi-view learning and learning to rank have been studied extensively
leading to a wide range of applications, multi-view learning to rank as a
synergy of both topics has received little attention. The aim of the paper is
to propose a composite ranking method while keeping a close correlation with
the individual rankings simultaneously. We present a generic framework for
multi-view subspace learning to rank (MvSL2R), and two novel solutions are
introduced under the framework. The first solution captures information of
feature mappings from within each view as well as across views using
autoencoder-like networks. Novel feature embedding methods are formulated in
the optimization of multi-view unsupervised and discriminant autoencoders.
Moreover, we introduce an end-to-end solution to learning towards both the
joint ranking objective and the individual rankings. The proposed solution
enhances the joint ranking with minimum view-specific ranking loss, so that it
can achieve the maximum global view agreements in a single optimization
process. The proposed method is evaluated on three different ranking problems,
i.e. university ranking, multi-view lingual text ranking and image data
ranking, providing superior results compared to related methods.Comment: Published at IEEE TKD
- …