171 research outputs found

    Gene selection algorithms for microarray data based on least squares support vector machine

    Get PDF
    BACKGROUND: In discriminant analysis of microarray data, usually a small number of samples are expressed by a large number of genes. It is not only difficult but also unnecessary to conduct the discriminant analysis with all the genes. Hence, gene selection is usually performed to select important genes. RESULTS: A gene selection method searches for an optimal or near optimal subset of genes with respect to a given evaluation criterion. In this paper, we propose a new evaluation criterion, named the leave-one-out calculation (LOOC, A list of abbreviations appears just above the list of references) measure. A gene selection method, named leave-one-out calculation sequential forward selection (LOOCSFS) algorithm, is then presented by combining the LOOC measure with the sequential forward selection scheme. Further, a novel gene selection algorithm, the gradient-based leave-one-out gene selection (GLGS) algorithm, is also proposed. Both of the gene selection algorithms originate from an efficient and exact calculation of the leave-one-out cross-validation error of the least squares support vector machine (LS-SVM). The proposed approaches are applied to two microarray datasets and compared to other well-known gene selection methods using codes available from the second author. CONCLUSION: The proposed gene selection approaches can provide gene subsets leading to more accurate classification results, while their computational complexity is comparable to the existing methods. The GLGS algorithm can also better scale to datasets with a very large number of genes

    Discriminant feature analysis for pattern recognition

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computer assisted eye fungal infection diagnosis

    Get PDF
    In this thesis, an attempt has been made to assist the diagnosis of Fungal Keratitis, a fungal infection that occurs in the corneal layers of the eye, by identifying the region of infection in the corneal images using fractal-based features. Three features related to the fractal dimension of the surface of the image, when represented in a 3D using the pixel intensity measure, are used to identify these regions in the image. To reduce the computation complexity, Fisher linear discriminant (FLD) is used to reduce the 3D raw feature to 1D feature, while preserving feature values. Using the adaptive mixtures (AM) method, the probability density distribution of the two class fractal features, is estimated. A training corneal image has been used to build the two-class probability density distribution. In this work, we use Bayesian classifier, a standard statistical pattern classification technique, to classify the pixels in corneal images, using the two-class probability density distribution. The classifier outputs an image mask, highlighting the fungal infected region in the corneal image. The whole system is implemented in MATLAB

    Robust Face Representation and Recognition Under Low Resolution and Difficult Lighting Conditions

    Get PDF
    This dissertation focuses on different aspects of face image analysis for accurate face recognition under low resolution and poor lighting conditions. A novel resolution enhancement technique is proposed for enhancing a low resolution face image into a high resolution image for better visualization and improved feature extraction, especially in a video surveillance environment. This method performs kernel regression and component feature learning in local neighborhood of the face images. It uses directional Fourier phase feature component to adaptively lean the regression kernel based on local covariance to estimate the high resolution image. For each patch in the neighborhood, four directional variances are estimated to adapt the interpolated pixels. A Modified Local Binary Pattern (MLBP) methodology for feature extraction is proposed to obtain robust face recognition under varying lighting conditions. Original LBP operator compares pixels in a local neighborhood with the center pixel and converts the resultant binary string to 8-bit integer value. So, it is less effective under difficult lighting conditions where variation between pixels is negligible. The proposed MLBP uses a two stage encoding procedure which is more robust in detecting this variation in a local patch. A novel dimensionality reduction technique called Marginality Preserving Embedding (MPE) is also proposed for enhancing the face recognition accuracy. Unlike Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), which project data in a global sense, MPE seeks for a local structure in the manifold. This is similar to other subspace learning techniques but the difference with other manifold learning is that MPE preserves marginality in local reconstruction. Hence it provides better representation in low dimensional space and achieves lower error rates in face recognition. Two new concepts for robust face recognition are also presented in this dissertation. In the first approach, a neural network is used for training the system where input vectors are created by measuring distance from each input to its class mean. In the second approach, half-face symmetry is used, realizing the fact that the face images may contain various expressions such as open/close eye, open/close mouth etc., and classify the top half and bottom half separately and finally fuse the two results. By performing experiments on several standard face datasets, improved results were observed in all the new proposed methodologies. Research is progressing in developing a unified approach for the extraction of features suitable for accurate face recognition in a long range video sequence in complex environments

    Kernel-Based Data Mining Approach with Variable Selection for Nonlinear High-Dimensional Data

    Get PDF
    In statistical data mining research, datasets often have nonlinearity and high-dimensionality. It has become difficult to analyze such datasets in a comprehensive manner using traditional statistical methodologies. Kernel-based data mining is one of the most effective statistical methodologies to investigate a variety of problems in areas including pattern recognition, machine learning, bioinformatics, chemometrics, and statistics. In particular, statistically-sophisticated procedures that emphasize the reliability of results and computational efficiency are required for the analysis of high-dimensional data. In this dissertation, first, a novel wrapper method called SVM-ICOMP-RFE based on hybridized support vector machine (SVM) and recursive feature elimination (RFE) with information-theoretic measure of complexity (ICOMP) is introduced and developed to classify high-dimensional data sets and to carry out subset selection of the variables in the original data space for finding the best for discriminating between groups. Recursive feature elimination (RFE) ranks variables based on the information-theoretic measure of complexity (ICOMP) criterion. Second, a dual variables functional support vector machine approach is proposed. The proposed approach uses both the first and second derivatives of the degradation profiles. The modified floating search algorithm for the repeated variable selection, with newly-added degradation path points, is presented to find a few good variables while reducing the computation time for on-line implementation. Third, a two-stage scheme for the classification of near infrared (NIR) spectral data is proposed. In the first stage, the proposed multi-scale vertical energy thresholding (MSVET) procedure is used to reduce the dimension of the high-dimensional spectral data. In the second stage, a few important wavelet coefficients are selected using the proposed SVM gradient-recursive feature elimination (RFE). Fourth, a novel methodology based on a human decision making process for discriminant analysis called PDCM is proposed. The proposed methodology consists of three basic steps emulating the thinking process: perception, decision, and cognition. In these steps two concepts known as support vector machines for classification and information complexity are integrated to evaluate learning models

    A Study of Spam E-mail classification using Feature Selection package

    Get PDF
    Feature selection (FS) is the technique of selecting a subset of relevant features for building learning models. FS algorithms typically fall into two categories: feature ranking and subset selection. Feature ranking ranks the features by a metric and eliminates all features that do not achieve an adequate score. Subset selection searches the set of possible features for the optimal subset. Many FS algorithm have been proposed. This paper presents a new FS technique which is guided by Fselector Package. The package Fselector implements a novel FS algorithm which is devoted to the feature ranking and feature subset selection of high dimensional data. This package provides functions for selecting attributes from a given dataset. Attribute subset selection is the process of identifying and removing as much of the irrelevant and redundant information as possible. The R package provides a convenient interface to the algorithm. This paper investigates the effectiveness of twelve commonly used FS methods on spam data set. One of the basic popular methods involves filter which select the subset of feature as preprocessing step independent of chosen classifier, Support vector machine classifier. The algorithm is designed as a wrapper around five classification algorithms. The short description of the algorithm and performance measure of its classification is presented with the spam data set

    Discriminant analysis based feature extraction for pattern recognition

    Get PDF
    Fisher's linear discriminant analysis (FLDA) has been widely used in pattern recognition applications. However, this method cannot be applied for solving the pattern recognition problems if the within-class scatter matrix is singular, a condition that occurs when the number of the samples is small relative to the dimension of the samples. This problem is commonly known as the small sample size (SSS) problem and many of the FLDA variants proposed in the past to deal with this problem suffer from excessive computational load because of the high dimensionality of patterns or lose some useful discriminant information. This study is concerned with developing efficient techniques for discriminant analysis of patterns while at the same time overcoming the small sample size problem. With this objective in mind, the work of this research is divided into two parts. In part 1, a technique by solving the problem of generalized singular value decomposition (GSVD) through eigen-decomposition is developed for linear discriminant analysis (LDA). The resulting algorithm referred to as modified GSVD-LDA (MGSVD-LDA) algorithm is thus devoid of the singularity problem of the scatter matrices of the traditional LDA methods. A theorem enunciating certain properties of the discriminant subspace derived by the proposed GSVD-based algorithms is established. It is shown that if the samples of a dataset are linearly independent, then the samples belonging to different classes are linearly separable in the derived discriminant subspace; and thus, the proposed MGSVD-LDA algorithm effectively captures the class structure of datasets with linearly independent samples. Inspired by the results of this theorem that essentially establishes a class separability of linearly independent samples in a specific discriminant subspace, in part 2, a new systematic framework for the pattern recognition of linearly independent samples is developed. Within this framework, a discriminant model, in which the samples of the individual classes of the dataset lie on parallel hyperplanes and project to single distinct points of a discriminant subspace of the underlying input space, is shown to exist. Based on this model, a number of algorithms that are devoid of the SSS problem are developed to obtain this discriminant subspace for datasets with linearly independent samples. For the discriminant analysis of datasets for which the samples are not linearly independent, some of the linear algorithms developed in this thesis are also kernelized. Extensive experiments are conducted throughout this investigation in order to demonstrate the validity and effectiveness of the ideas developed in this study. It is shown through simulation results that the linear and nonlinear algorithms for discriminant analysis developed in this thesis provide superior performance in terms of the recognition accuracy and computational complexit
    corecore