65,798 research outputs found

    Linear Dimensionality Reduction for Margin-Based Classification: High-Dimensional Data and Sensor Networks

    Get PDF
    Low-dimensional statistics of measurements play an important role in detection problems, including those encountered in sensor networks. In this work, we focus on learning low-dimensional linear statistics of high-dimensional measurement data along with decision rules defined in the low-dimensional space in the case when the probability density of the measurements and class labels is not given, but a training set of samples from this distribution is given. We pose a joint optimization problem for linear dimensionality reduction and margin-based classification, and develop a coordinate descent algorithm on the Stiefel manifold for its solution. Although the coordinate descent is not guaranteed to find the globally optimal solution, crucially, its alternating structure enables us to extend it for sensor networks with a message-passing approach requiring little communication. Linear dimensionality reduction prevents overfitting when learning from finite training data. In the sensor network setting, dimensionality reduction not only prevents overfitting, but also reduces power consumption due to communication. The learned reduced-dimensional space and decision rule is shown to be consistent and its Rademacher complexity is characterized. Experimental results are presented for a variety of datasets, including those from existing sensor networks, demonstrating the potential of our methodology in comparison with other dimensionality reduction approaches.National Science Foundation (U.S.). Graduate Research Fellowship ProgramUnited States. Army Research Office (MURI funded through ARO Grant W911NF-06-1-0076)United States. Air Force Office of Scientific Research (Award FA9550-06-1-0324)Shell International Exploration and Production B.V

    Evolutionary algorithms and weighting strategies for feature selection in predictive data mining

    Get PDF
    The improvements in Deoxyribonucleic Acid (DNA) microarray technology mean that thousands of genes can be profiled simultaneously in a quick and efficient manner. DNA microarrays are increasingly being used for prediction and early diagnosis in cancer treatment. Feature selection and classification play a pivotal role in this process. The correct identification of an informative subset of genes may directly lead to putative drug targets. These genes can also be used as an early diagnosis or predictive tool. However, the large number of features (many thousands) present in a typical dataset present a formidable barrier to feature selection efforts. Many approaches have been presented in literature for feature selection in such datasets. Most of them use classical statistical approaches (e.g. correlation). Classical statistical approaches, although fast, are incapable of detecting non-linear interactions between features of interest. By default, Evolutionary Algorithms (EAs) are capable of taking non-linear interactions into account. Therefore, EAs are very promising for feature selection in such datasets. It has been shown that dimensionality reduction increases the efficiency of feature selection in large and noisy datasets such as DNA microarray data. The two-phase Evolutionary Algorithm/k-Nearest Neighbours (EA/k-NN) algorithm is a promising approach that carries out initial dimensionality reduction as well as feature selection and classification. This thesis further investigates the two-phase EA/k-NN algorithm and also introduces an adaptive weights scheme for the k-Nearest Neighbours (k-NN) classifier. It also introduces a novel weighted centroid classification technique and a correlation guided mutation approach. Results show that the weighted centroid approach is capable of out-performing the EA/k-NN algorithm across five large biomedical datasets. It also identifies promising new areas of research that would complement the techniques introduced and investigated

    Reduced Deep Convolutional Activation Features (R-DeCAF) in Histopathology Images to Improve the Classification Performance for Breast Cancer Diagnosis

    Full text link
    Breast cancer is the second most common cancer among women worldwide. Diagnosis of breast cancer by the pathologists is a time-consuming procedure and subjective. Computer aided diagnosis frameworks are utilized to relieve pathologist workload by classifying the data automatically, in which deep convolutional neural networks (CNNs) are effective solutions. The features extracted from activation layer of pre-trained CNNs are called deep convolutional activation features (DeCAF). In this paper, we have analyzed that all DeCAF features are not necessarily led to a higher accuracy in the classification task and dimension reduction plays an important role. Therefore, different dimension reduction methods are applied to achieve an effective combination of features by capturing the essence of DeCAF features. To this purpose, we have proposed reduced deep convolutional activation features (R-DeCAF). In this framework, pre-trained CNNs such as AlexNet, VGG-16 and VGG-19 are utilized in transfer learning mode as feature extractors. DeCAF features are extracted from the first fully connected layer of the mentioned CNNs and support vector machine has been used for binary classification. Among linear and nonlinear dimensionality reduction algorithms, linear approaches such as principal component analysis (PCA) represent a better combination among deep features and lead to a higher accuracy in the classification task using small number of features considering specific amount of cumulative explained variance (CEV) of features. The proposed method is validated using experimental BreakHis dataset. Comprehensive results show improvement in the classification accuracy up to 4.3% with less computational time. Best achieved accuracy is 91.13% for 400x data with feature vector size (FVS) of 23 and CEV equals to 0.15 using pre-trained AlexNet as feature extractor and PCA as feature reduction algorithm

    Geometric Distribution Weight Information Modeled Using Radial Basis Function with Fractional Order for Linear Discriminant Analysis Method

    Get PDF
    Fisher linear discriminant analysis (FLDA) is a classic linear feature extraction and dimensionality reduction approach for face recognition. It is known that geometric distribution weight information of image data plays an important role in machine learning approaches. However, FLDA does not employ the geometric distribution weight information of facial images in the training stage. Hence, its recognition accuracy will be affected. In order to enhance the classification power of FLDA method, this paper utilizes radial basis function (RBF) with fractional order to model the geometric distribution weight information of the training samples and proposes a novel geometric distribution weight information based Fisher discriminant criterion. Subsequently, a geometric distribution weight information based LDA (GLDA) algorithm is developed and successfully applied to face recognition. Two publicly available face databases, namely, ORL and FERET databases, are selected for evaluation. Compared with some LDA-based algorithms, experimental results exhibit that our GLDA approach gives superior performance

    KCRC-LCD: Discriminative Kernel Collaborative Representation with Locality Constrained Dictionary for Visual Categorization

    Full text link
    We consider the image classification problem via kernel collaborative representation classification with locality constrained dictionary (KCRC-LCD). Specifically, we propose a kernel collaborative representation classification (KCRC) approach in which kernel method is used to improve the discrimination ability of collaborative representation classification (CRC). We then measure the similarities between the query and atoms in the global dictionary in order to construct a locality constrained dictionary (LCD) for KCRC. In addition, we discuss several similarity measure approaches in LCD and further present a simple yet effective unified similarity measure whose superiority is validated in experiments. There are several appealing aspects associated with LCD. First, LCD can be nicely incorporated under the framework of KCRC. The LCD similarity measure can be kernelized under KCRC, which theoretically links CRC and LCD under the kernel method. Second, KCRC-LCD becomes more scalable to both the training set size and the feature dimension. Example shows that KCRC is able to perfectly classify data with certain distribution, while conventional CRC fails completely. Comprehensive experiments on many public datasets also show that KCRC-LCD is a robust discriminative classifier with both excellent performance and good scalability, being comparable or outperforming many other state-of-the-art approaches
    • …
    corecore