92,799 research outputs found

    Finding Fuzzy-rough Reducts with Fuzzy Entropy

    Get PDF
    Abstract—Dataset dimensionality is undoubtedly the single most significant obstacle which exasperates any attempt to apply effective computational intelligence techniques to problem domains. In order to address this problem a technique which re-duces dimensionality is employed prior to the application of any classification learning. Such feature selection (FS) techniques attempt to select a subset of the original features of a dataset which are rich in the most useful information. The benefits can include improved data visualisation and transparency, a reduction in training and utilisation times and potentially, im-proved prediction performance. Methods based on fuzzy-rough set theory have demonstrated this with much success. Such methods have employed the dependency function which is based on the information contained in the lower approximation as an evaluation step in the FS process. This paper presents three novel feature selection techniques employing fuzzy entropy to locate fuzzy-rough reducts. This approach is compared with two other fuzzy-rough feature selection approaches which utilise other measures for the selection of subsets. I

    Recognizing Surgically Altered Face Images and 3D Facial Expression Recognition

    Get PDF
    AbstractAltering Facial appearances using surgical procedures are common now days. But it raised challenges for face recognition algorithms. Plastic surgery introduces non linear variations. Because of these variations it is difficult to be modeled by the existing face recognition system. Here presents a multi objective evolutionary granular algorithm. It operates on several granules extracted from a face images at multiple level of granularity. This granular information is unified in an evolutionary manner using multi objective genetic approach. Then identify the facial expression from the face images. For that 3D facial shapes are considering here. A novel automatic feature selection method is proposed based on maximizing the average relative entropy of marginalized class-conditional feature distributions and apply it to a complete pool of candidate features composed of normalized Euclidian distances between 83 facial feature points in the 3D space. A regularized multi-class AdaBoost classification algorithm is used here to get the highest average recognition rate

    Risk Assessment of Gastric Cancer Caused by Helicobacter pylori Using CagA Sequence Markers

    Get PDF
    As a marker of Helicobacter pylori, Cytotoxin-associated gene A (cagA) has been revealed to be the major virulence factor causing gastroduodenal diseases. However, the molecular mechanisms that underlie the development of different gastroduodenal diseases caused by cagA-positive H. pylori infection remain unknown. Current studies are limited to the evaluation of the correlation between diseases and the number of Glu-Pro-Ile-Tyr-Ala (EPIYA) motifs in the CagA strain. To further understand the relationship between CagA sequence and its virulence to gastric cancer, we proposed a systematic entropy-based approach to identify the cancer-related residues in the intervening regions of CagA and employed a supervised machine learning method for cancer and non-cancer cases classification.An entropy-based calculation was used to detect key residues of CagA intervening sequences as the gastric cancer biomarker. For each residue, both combinatorial entropy and background entropy were calculated, and the entropy difference was used as the criterion for feature residue selection. The feature values were then fed into Support Vector Machines (SVM) with the Radial Basis Function (RBF) kernel, and two parameters were tuned to obtain the optimal F value by using grid search. Two other popular sequence classification methods, the BLAST and HMMER, were also applied to the same data for comparison.Our method achieved 76% and 71% classification accuracy for Western and East Asian subtypes, respectively, which performed significantly better than BLAST and HMMER. This research indicates that small variations of amino acids in those important residues might lead to the virulence variance of CagA strains resulting in different gastroduodenal diseases. This study provides not only a useful tool to predict the correlation between the novel CagA strain and diseases, but also a general new framework for detecting biological sequence biomarkers in population studies

    Network Uncertainty Informed Semantic Feature Selection for Visual SLAM

    Full text link
    In order to facilitate long-term localization using a visual simultaneous localization and mapping (SLAM) algorithm, careful feature selection can help ensure that reference points persist over long durations and the runtime and storage complexity of the algorithm remain consistent. We present SIVO (Semantically Informed Visual Odometry and Mapping), a novel information-theoretic feature selection method for visual SLAM which incorporates semantic segmentation and neural network uncertainty into the feature selection pipeline. Our algorithm selects points which provide the highest reduction in Shannon entropy between the entropy of the current state and the joint entropy of the state, given the addition of the new feature with the classification entropy of the feature from a Bayesian neural network. Each selected feature significantly reduces the uncertainty of the vehicle state and has been detected to be a static object (building, traffic sign, etc.) repeatedly with a high confidence. This selection strategy generates a sparse map which can facilitate long-term localization. The KITTI odometry dataset is used to evaluate our method, and we also compare our results against ORB_SLAM2. Overall, SIVO performs comparably to the baseline method while reducing the map size by almost 70%.Comment: Published in: 2019 16th Conference on Computer and Robot Vision (CRV

    Approaches to Feature Identification and Feature Selection for Binary and Multi-Class Classification

    Get PDF
    University of Minnesota Ph.D. dissertation. 2007. Major: Electrical Engineering. Advisor: Keshab Parhi. 1 computer file (PDF); 182 pages.In this dissertation, we address issues of (a) feature identification and extraction, and (b) feature selection. Nowadays, datasets are getting larger and larger, especially due to the growth of the internet data and bio-informatics. Thus, applying feature extraction and selection to reduce the dimensionality of the data size is crucial to data mining. Our first objective is to identify discriminative patterns in time series datasets. Using auto-regressive modeling, we show that, if two bands are selected appropriately, then the ratio of band power is amplified for one of the two states. We introduce a novel frequency-domain power ratio (FDPR) test to determine how these two bands should be selected. The FDPR computes the ratio of the two model filter transfer functions where the model filters are estimated using different parts of the time-series that correspond to two different states. The ratio implicitly cancels the effect of change of variance of the white noise that is input to the model. Thus, even in a highly non-stationary environment, the ratio feature is able to correctly identify a change of state. Synthesized data and application examples from seizure prediction are used to prove validity of the proposed approach. We also illustrate that combining the spectral power ratios features with absolute spectral powers and relative spectral powers as a feature set and then carefully selecting a small number features from a few electrodes can achieve a good detection and prediction performances on short-term datasets and long-term fragmented datasets collected from subjects with epilepsy. Our second objective is to develop efficient feature selection methods for binary classification (MUSE) and multi-class classification (M3U) that effectively select important features to achieve a good classification performance. We propose a novel incremental feature selection method based on minimum uncertainty and feature sample elimination (referred as MUSE) for binary classification. The proposed approach differs from prior mRMR approach in how the redundancy of the current feature with previously selected features is reduced. In the proposed approach, the feature samples are divided into a pre-specified number of bins; this step is referred to as feature quantization. A novel uncertainty score for each feature is computed by summing the conditional entropies of the bins, and the feature with the lowest uncertainty score is selected. For each bin, its impurity is computed by taking the minimum of the probability of Class 1 and of Class 2. The feature samples corresponding to the bins with impurities below a threshold are discarded and are not used for selection of the subsequent features. The significance of the MUSE feature selection method is demonstrated using the two datasets: arrhythmia and hand digit recognition (Gisette), and datasets for seizure prediction from five dogs and two humans. It is shown that the proposed method outperforms the prior mRMR feature selection method for most cases. We further extends the MUSE algorithm for multi-class classification problems. We propose a novel multiclass feature selection algorithm based on weighted conditional entropy, also referred to as uncertainty. The goal of the proposed algorithm is to select a feature subset such that, for each feature sample, there exists a feature that has a low uncertainty score in the selected feature subset. Features are first quantized into different bins. The proposed feature selection method first computes an uncertainty vector from weighted conditional entropy. Lower the uncertainty score for a class, better is the separability of the samples in that class. Next, an iterative feature selection method selects a feature in each iteration by (1) computing the minimum uncertainty score for each feature sample for all possible feature subset candidates, (2) computing the average minimum uncertainty score across all feature samples, and (3) selecting the feature that achieves the minimum of the mean of the minimum uncertainty score. The experimental results show that the proposed algorithm outperforms mRMR and achieves lower misclassification rates using various types of publicly available datasets. In most cases, the number of features necessary for a specified misclassification error is less than that required by traditional methods
    • …
    corecore