2,953 research outputs found
Support Vector Machines for Credit Scoring and discovery of significant features
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1
Markov Chain Analysis of Evolution Strategies on a Linear Constraint Optimization Problem
This paper analyses a -Evolution Strategy, a randomised
comparison-based adaptive search algorithm, on a simple constraint optimisation
problem. The algorithm uses resampling to handle the constraint and optimizes a
linear function with a linear constraint. Two cases are investigated: first the
case where the step-size is constant, and second the case where the step-size
is adapted using path length control. We exhibit for each case a Markov chain
whose stability analysis would allow us to deduce the divergence of the
algorithm depending on its internal parameters. We show divergence at a
constant rate when the step-size is constant. We sketch that with step-size
adaptation geometric divergence takes place. Our results complement previous
studies where stability was assumed.Comment: Amir Hussain; Zhigang Zeng; Nian Zhang. IEEE Congress on Evolutionary
Computation, Jul 2014, Beijing, Chin
Discriminative Topological Features Reveal Biological Network Mechanisms
Recent genomic and bioinformatic advances have motivated the development of
numerous random network models purporting to describe graphs of biological,
technological, and sociological origin. The success of a model has been
evaluated by how well it reproduces a few key features of the real-world data,
such as degree distributions, mean geodesic lengths, and clustering
coefficients. Often pairs of models can reproduce these features with
indistinguishable fidelity despite being generated by vastly different
mechanisms. In such cases, these few target features are insufficient to
distinguish which of the different models best describes real world networks of
interest; moreover, it is not clear a priori that any of the presently-existing
algorithms for network generation offers a predictive description of the
networks inspiring them. To derive discriminative classifiers, we construct a
mapping from the set of all graphs to a high-dimensional (in principle
infinite-dimensional) ``word space.'' This map defines an input space for
classification schemes which allow us for the first time to state unambiguously
which models are most descriptive of the networks they purport to describe. Our
training sets include networks generated from 17 models either drawn from the
literature or introduced in this work, source code for which is freely
available. We anticipate that this new approach to network analysis will be of
broad impact to a number of communities.Comment: supplemental website:
http://www.columbia.edu/itc/applied/wiggins/netclass
- …