244 research outputs found

    Independent component analysis for naive bayes classification

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Class-Conditional Probabilistic Principal Component Analysis: application to gender recognition

    Get PDF
    Este trabajo presenta una solución al problema del reconocimiento del género de un rostro humano a partir de una imagen. Adoptamos una aproximación que utiliza la cara completa a través de la textura de la cara normalizada y redimensionada como entrada a un clasificador Näive Bayes. Presentamos la técnica de Análisis de Componentes Principales Probabilístico Condicionado-a-la-Clase (CC-PPCA) para reducir la dimensionalidad de los vectores de características para la clasificación y asegurar la asunción de independencia para el clasificador. Esta nueva aproximación tiene la deseable propiedad de presentar un modelo paramétrico sencillo para las marginales. Además, este modelo puede estimarse con muy pocos datos. En los experimentos que hemos desarrollados mostramos que CC-PPCA obtiene un 90% de acierto en la clasificación, resultado muy similar al mejor presentado en la literatura---ABSTRACT---This paper presents a solution to the problem of recognizing the gender of a human face from an image. We adopt a holistic approach by using the cropped and normalized texture of the face as input to a Naïve Bayes classifier. First it is introduced the Class-Conditional Probabilistic Principal Component Analysis (CC-PPCA) technique to reduce the dimensionality of the classification attribute vector and enforce the independence assumption of the classifier. This new approach has the desirable property of a simple parametric model for the marginals. Moreover this model can be estimated with very few data. In the experiments conducted we show that using CCPPCA we get 90% classification accuracy, which is similar result to the best in the literature. The proposed method is very simple to train and implement

    MCDM approach to evaluating bank loan default models

    Get PDF
    Banks and financial institutions rely on loan default prediction models in credit risk management. An important yet challenging task in developing and applying default classification models is model evaluation and selection. This study proposes an evaluation approach for bank loan default classification models based on multiple criteria decision making (MCDM) methods. A large real-life Chinese bank loan dataset is used to validate the proposed approach. Specifically, a set of performance metrics is utilized to measure a selection of statistical and machine-learning default models. The technique for order preference by similarity to ideal solution (TOPSIS), a MCDM method, takes the performances of default classification models on multiple performance metrics as inputs to generate a ranking of default risk models. In addition, feature selection and sampling techniques are applied to the data pre-processing step to handle high dimensionality and class unbalancedness of bank loan default data. The results show that K-Nearest Neighbor algorithm has a good potential in bank loan default prediction

    Automated Knowledge Discovery from Functional Magnetic Resonance Images using Spatial Coherence

    Get PDF
    Functional Magnetic Resonance Imaging (fMRI) has the potential to unlock many of the mysteries of the brain. Although this imaging modality is popular for brain-mapping activities, clinical applications of this technique are relatively rare. For clinical applications, classification models are more useful than the current practice of reporting loci of neural activation associated with particular disorders. Also, since the methods used to account for anatomical variations between subjects are generally imprecise, the conventional voxel-by-voxel analysis limits the types of discoveries that are possible. This work presents a classification-based framework for knowledge discovery from fMRI data. Instead of voxel-centric knowledge discovery, this framework is segment-centric, where functional segments are clumps of voxels that represent a functional unit in the brain. With simulated activation images, it is shown that this segment-based approach can be more successful for knowledge discovery than conventional voxel-based approaches. The spatial coherence principle refers to the homogeneity of behavior of spatially contiguous voxels. Auto-threshold Contrast Enhancing Iterative Clustering (ACEIC) - a new algorithm based on the spatial coherence principle is presented here for functional segmentation. With benchmark data, it is shown that the ACEIC method can achieve higher segmentation accuracy than Probabilistic Independent Component Analysis - a popular method used for fMRI data analysis. The spatial coherence principle can also be exploited for voxel-centric image-classification problems. Spatially Coherent Voxels (SCV) is a new feature selection method that uses the spatial coherence principle to eliminate features that are unlikely to be useful for classification. For a Substance Use Disorder dataset, it is demonstrated that feature selection with SCV can achieve higher classification accuracies than conventional feature selection methods

    Boosting Principal Component Analysis by Genetic Algorithm

    Get PDF
    This paper presents a new method of feature extraction by combining principal component analysis and genetic algorithm. Use of multiple pre-processors in combination with principal component analysis generates alternate feature spaces for data representation. The present method works out the fusion of these multiple spaces to create higher dimensionality feature vectors. The fused feature vectors are given chromosome representation by taking feature components to be genes. Then these feature vectors are allowed to undergo genetic evolution individually. For genetic algorithm, initial population is created by calculating probability distance matrix, and by applying a probability distance metric such that all the genes which lie farther than a defined threshold are tripped to zero. The genetic evolution of fused feature vector brings out most significant feature components (genes) as survivours. A measure of significance is adapted on the basis of frequency of occurrence of the surviving genes in the current population. Finally, the feature vector is obtained by weighting the original feature components in proportion to their significance. The present algorithm is validated in combination with a neural network classifier based on error backpropagation algorithm, and by analysing a number of benchmark datasets available in the open sources.Defence Science Journal, 2010, 60(4), pp.392-398, DOI:http://dx.doi.org/10.14429/dsj.60.49

    A COMPARISON OF MACHINE LEARNING TECHNIQUES: E-MAIL SPAM FILTERING FROM COMBINED SWAHILI AND ENGLISH EMAIL MESSAGES

    Get PDF
    The speed of technology change is faster now compared to the past ten to fifteen years. It changes the way people live and force them to use the latest devices to match with the speed. In communication perspectives nowadays, use of electronic mail (e-mail) for people who want to communicate with friends, companies or even the universities cannot be avoided. This makes it to be the most targeted by the spammer and hackers and other bad people who want to get the benefit by sending spam emails. The report shows that the amount of emails sent through the internet in a day can be more than 10 billion among these 45% are spams. The amount is not constant as sometimes it goes higher than what is noted here. This indicates clearly the magnitude of the problem and calls for the need for more efforts to be applied to reduce this amount and also minimize the effects from the spam messages. Various measures have been taken to eliminate this problem. Once people used social methods, that is legislative means of control and now they are using technological methods which are more effective and timely in catching spams as these work by analyzing the messages content. In this paper we compare the performance of machine learning algorithms by doing the experiment for testing English language dataset, Swahili language dataset individual and combined two dataset to form one, and results from combined dataset compared them with the Gmail classifier. The classifiers which the researcher used are Naïve Bayes (NB), Sequential Minimal Optimization (SMO) and k-Nearest Neighbour (k-NN). The results for combined dataset shows that SMO classifier lead the others by achieve 98.60% of accuracy, followed by k-NN classifier which has 97.20% accuracy, and Naïve Bayes classifier has 92.89% accuracy. From this result the researcher concludes that SMO classifier can work better in dataset that combined English and Swahili languages. In English dataset shows that SMO classifier leads other algorism, it achieved 97.51% of accuracy, followed by k-NN with average accuracy of 93.52% and the last but also good accuracy is Naïve Bayes that come with 87.78%. Swahili dataset Naïve Bayes lead others by getting 99.12% accuracy followed by SMO which has 98.69% and the last was k-NN which has 98.47%

    Supervised classification of Hyperspectral Images

    Get PDF
    Η τεχνολογία της υπερφασματικής απεικόνισης γνώρισε μεγάλη ανάπτυξη κατά τις τελευταίες δεκαετίες και έχει βρει εφαρμογές σε ποικίλα πεδία όπως η γεωργία, η φαρμακευτική, η επεξεργασία τροφίμων, η ορυκτολογία, η φυσική και η αστρονομία. Επιπλέον, αποτελεί μία από τις σημαντικότερες καινοτομίες στον τομέα της τηλεπισκόπησης που είναι το πεδίο εφαρμογής που εξετάζεται στην παρούσα διατριβή. Σε αυτό το πεδίο εφαρμογής, τα σχετικά συστήματα υπερφασματικής απεικόνισης λαμβάνουν υπερφασματικές εικόνες που απεικονίζουν (γενικά) μεγάλες περιοχές στις επιφάνειες της γης. Ένα από τα κύρια προβλήματα επεξεργασίας για αυτό το είδος δεδομένων είναι εκείνο της ταξινόμησης των εικονοστοιχείων της εικόνας σε συγκεκριμένες κατηγορίες. Ο κύριος στόχος της παρούσας εργασίας είναι η συγκριτική μελέτη των πλέον δημοφιλών μεθόδων ταξινόμησης με βάση την απόδοσή τους σε υπερφασματικά δεδομένα. Επιπλέον, προτείνεται μια νέα μέθοδος ταξινόμησης «ειδικού σκοπού», κατάλληλη για το πρόβλημα της ταξινόμησης ενός εικονοστοιχείου μιας υπερφασματικής εικόνας, η οποία συνδυάζει τόσο φασματική όσο και χωρική πληροφορία.Hyperspectral technology has been developed during the last decades and it has found several applications in agriculture, medicine, food processing, mineralogy, physics and astronomy. Furthermore, it is one of the most significant breakthroughs in the field of remote sensing, which is the field of application considered in the present thesis. In this application field, the relative hyperspectral imaging systems capture hyperspectral images that depict large areas on earth surfaces. One of the main processing problems for this kind of data is that of classification of the image pixels to certain classes. In this diploma dissertation a comparative study of well-established supervised classification methods is carried out in terms of their performance on hyperspectral data. In addition, a new classification method, tailored for the hyperspectral image classification problem, is proposed, which combines both spectral and spatial information to classify an image pixel
    corecore