8 research outputs found
Credit Scoring Based on Hybrid Data Mining Classification
The credit scoring has been regarded as a critical topic. This study proposed four approaches combining with the NN (Neural Network) classifier for features selection that retains sufficient information for classification purpose. Two UCI data sets and different approaches combined with NN classifier were constructed by selecting features. NN classifier combines with conventional statistical LDA, Decision tree, Rough set and F-score approaches as features preprocessing step to optimize feature space by removing both irrelevant and redundant features. The procedure of the proposed algorithm is described first and then evaluated by their performances. The results are compared in combination with NN classifier and nonparametric Wilcoxon signed rank test will be held to show if there has any significant difference between these approaches. Our results suggest that hybrid credit scoring models are robust and effective in finding optimal subsets and the compound procedure is a promising method to the fields of data mining
A prototype classification method and its use in a hybrid solution for multiclass pattern recognition
In order to combine a fast multiclass classification method with an effective binary classification method, we have developed a prototype learning/matching scheme that can be integrated with support vector machines (SVM) for vector-matching applica-tions. This prototype classification method employs a learning process to determine both the number and the location of prototypes. The learning process decides whether to stop adding prototypes according to a certain termination condition, and also adjusts the loca-tion of prototypes using either the K-means (KM) or the fuzzy c-means (FCM) clustering algorithms. When the prototype classification method is applied, the SVM method can be used to post-process the top-rank candidates obtained during the prototype learning or matching process. We apply this hybrid solution to handwriting recognition. Our experiment results show that this solution saves a substantial amount of training and testing time when the number of class types is large, and achieves comparable accuracy rates to those achieved by using SVM solely. In this paper, we compare the convergence behavior and runtime consumption of the prototype construction process, and discuss how to combine our prototype classifier with SVM classifiers to form an effective hybrid classifier
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
Contribution à l'intégration des machines à vecteurs de support au sein des systèmes de reconnaisance de formes : application à la lecture automatique de l'écriture manuscrite
Durant ces dernières années, les machines à vecteurs de support (SVM) ont démontré maintes reprises leur supériorité en termes de généralisation. L'objectif de cette thèse de doctorat a alors consisté à isoler les principaux problèmes liés à l'intégration des SVM au sein de systèmes de reconnaissance de formes et notamment des systèmes de lecture automatique de l'écriture manuscrite et à y apporter des éléments de réponse. Nous nous sommes ainsi intéressés à la résolution de problèmes multi-classes, à l'estimation de probabilités a posteriori d'appartenance aux différentes classes, à l'accélération de la prise de décision et enfin à la combinaison avec une approche de classification agissant par modélisation de manière à pouvoir traiter efficacement à la fois les données ambiguës et les données aberrantes
Pattern recognition using statistical techniques and neural networks: application to handwritten digit classification
El Reconocimiento de Patrones es el estudio de cómo las máquinas pueden observar el ambiente o entorno, aprender a distinguir patrones de interés a partir de la experiencia, y tomar decisiones razonables con respecto a las categorías a las que pertenecen dichos patrones. El mejor reconocedor de patrones conocido hasta ahora es el ser humano, no sabiéndose a ciencia cierta cuál es el proceso mediante el cual los humanos realizamos esta tarea. El Reconocimiento Optico de Caracteres (OCR) es uno de los tópicos más antiguos dentro del Reconocimiento de Patrones y una de las areas de investigación más importante y activa, que en la actualidad presenta desafío: la precisión en el reconocimiento asociada tanto a caracteres impresos en una imagen degradada o a caracteres manuscritos es aún insuficiente, existiendo errores en el reconocimiento. El Reconocimiento de Dígitos Manuscritos es un tema destacado dentro de OCR, por las aplicaciones relacionadas, como el procesamiento automático de cheques bancarios, la clasificación de correo en base a la lectura de códigos postales, la lectura automática de formularios y documentos con escritura manuscrita, dispositivos de lectura para ciegos, reconocimiento de escritura en computadoras manuales PDA, y porque constituye un problema modelo que incluye desafíos comunes con otros tópicos. Por esta razón, es tomado como referencia para la aplicación y testeo de nuevas teorías y algoritmos del area de Reconocimiento de Patrones en general. En este trabajo de tesis de doctorado se propone una nueva estrategia Bayesiana de combinación de clasificadores que permite detectar ambigüedades y resolverlas, lo que constituye la novedad y principal contribución de la tesis. Se propone, a su vez, un sistema completo de reconocimiento de patrones en dos niveles, con una arquitectura modular y paralelizable, que utiliza distintas características extraídas de los patrones de entrada según el problema a resolver junto con la estrategia Bayesiana ya mencionada que decide la respuesta del sistema. Como elementos componentes del reconocedor, en una primera capa o nivel, se utilizan clasificadores relativamente sencillos y bien posicionados para el problema a tratar. Los elementos pertenecientes a la segunda capa se utilizan para estimar cuán confiable es la respuesta de cada clasificador individual frente a un patrón de entrada, permitiendo decidir cuándo un patrón debe ser considerado bien definido o ambiguo, y en este ultimo caso con qué clases podrá confundirse. Adicionalmente, se proponen y aplican estrategias de selección de clasificadores en la etapa de construcción del reconocedor. El sistema reconocedor de patrones presentado fue aplicado al problema del reconocimiento de dígitos manuscritos off-line, como forma de testear su desempeño. En función de esto, se proponen descriptores basados en características de multirresolución a través del uso de la Transformada Wavelet CDF 9/7 y de Análisis de Componentes Principales, que permiten disminuir considerablemente el tamaño del patrón de entrada y aumentar la calidad de la representación. La experimentación se realizó sobre las bases de datos CENPARMI y MNIST, ampliamente referenciadas para este problema. Se obtuvieron altos porcentajes en el reconocimiento que alcanzaron un 97,40 y 99,32 % para las bases CENPARMI y MNIST respectivamente. Dichos valores son comparables a los resultados publicados considerados representativos.Pattern Recognition is the study of how machines can observe the environment, learn to distinguish patterns of interest from their background, and make sound and reasonable decisions about the categories of the patterns. The best pattern recognizers in most instances are humans, yet we do not understand how humans recognize patterns. Optical character recognition (OCR) is one of the most traditional topics in the context of Pattern Recognition that includes as a key issue the automatic recognition of handwritten characters. The subject has many interesting applications, such as automatic recognition of postal codes, recognition of amounts in banking checks and automatic processing of application forms. Handwritten numeral classification is a difficult task because of the wide variety of styles, strokes and orientations of digit samples. One of the main difficulties lies in the fact that the intra-class variance is high, due to the different forms associated with the same pattern, because of the particular writing style of each individual. Many models have been proposed to deal with this problem, but none of them has succeeded in obtaining levels of response comparable to human ones. This thesis presents a pattern recognition system that is able to detect ambiguous patterns and ex- plain its answers using a Bayesian strategy which is the main contribution of this work. The recogniser is composed of two levels. The first one is formed by a collection of independent classifiers, each one spe- cialised in a different feature extracted from the input pattern. The second level consists of an analyzing module in charge of defining and explaining the output of the system. This module is integrated by the following elements: the table of reliability and two parameters adjustable while running the system. The system has been applied to the off-line recognition of handwritten digits. Descriptors based on the CDF 9/7 wavelet transform and Principal Component Analysis are proposed in order to reduce the size of the input pattern while increasing the quality of its representation. Strategies for selecting classifiers for the system are also proposed. The experiments were carried out on the MNIST and CENPARMI handwritten digit databases, which are generally accepted as standards in most of the literature in the field. Recognition rates obtained are comparable with results from representative work, reaching 97.40 and 99.32 % for CENPARMI and MNIST databases respectively.Fil:Seijas, Leticia María. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina
Design of Machine Learning Algorithms with Applications to Breast Cancer Detection
Machine learning is concerned with the design and development of algorithms and
techniques that allow computers to 'learn' from experience with respect to some class
of tasks and performance measure. One application of machine learning is to improve
the accuracy and efficiency of computer-aided diagnosis systems to assist physician,
radiologists, cardiologists, neuroscientists, and health-care technologists. This thesis
focuses on machine learning and the applications to breast cancer detection. Emphasis
is laid on preprocessing of features, pattern classification, and model selection.
Before the classification task, feature selection and feature transformation may be
performed to reduce the dimensionality of the features and to improve the classification
performance. Genetic algorithm (GA) can be employed for feature selection based
on different measures of data separability or the estimated risk of a chosen classifier.
A separate nonlinear transformation can be performed by applying kernel principal
component analysis and kernel partial least squares.
Different classifiers are proposed in this work: The SOM-RBF network combines
self-organizing maps (SOMs) and radial basis function (RBF) networks, with the RBF
centers set as the weight vectors of neurons from the competitive layer of a trained
SaM. The pairwise Rayleigh quotient (PRQ) classifier seeks one discriminating boundary
by maximizing an unconstrained optimization objective, named as the PRQ criterion,
formed with a set of pairwise const~aints instead of individual training samples.
The strict 2-surface proximal (S2SP) classifier seeks two proximal planes that are not
necessary parallel to fit the distribution of the samples in the original feature space or
a kernel-defined feature space, by ma-ximizing two strict optimization objectives with
a 'square of sum' optimization factor. Two variations of the support vector data description
(SVDD) with negative samples (NSVDD) are proposed by involving different
forms of slack vectors, which learn a closed spherically shaped boundary, named as the
supervised compact hypersphere (SCH), around a set of samples in the target class. \Ve
extend the NSVDDs to solve the multi-class classification problems based on distances
between the samples and the centers of the learned SCHs in a kernel-defined feature
space, using a combination of linear discriminant analysis and the nearest-neighbor rule.
The problem of model selection is studied to pick the best values of the hyperparameters
for a parametric classifier. To choose the optimal kernel or regularization
parameters of a classifier, we investigate different criteria, such as the validation error
estimate and the leave-out-out bound, as well as different optimization methods, such
as grid search, gradient descent, and GA. By viewing the tuning problem of the multiple
parameters of an 2-norm support vector machine (SVM) as an identification problem
of a nonlinear dynamic system, we design a tuning system by employing the extended
Kalman filter based on cross validation. Independent kernel optimization based on
different measures of data separability are a~so investigated for different kernel-based
classifiers.
Numerous computer experiments using the benchmark datasets verify the theoretical
results, make comparisons among the techniques in measures of classification
accuracy or area under the receiver operating characteristics curve. Computational
requirements, such as the computing time and the number of hyper-parameters, are
also discussed.
All of the presented methods are applied to breast cancer detection from fine-needle
aspiration and in mammograms, as well as screening of knee-joint vibroarthrographic
signals and automatic monitoring of roller bearings with vibration signals. Experimental
results demonstrate the excellence of these methods with improved classification
performance.
For breast cancer detection, instead of only providing a binary diagnostic decision
of 'malignant' or 'benign', we propose methods to assign a measure of confidence
of malignancy to an individual mass, by calculating probabilities of being benign and
malignant with a single classifier or a set of classifiers