82,618 research outputs found

    Supervised Feature Space Reduction for Multi-Label Nearest Neighbors

    Get PDF
    International audienceWith the ability to process many real-world problems, multi-label classification has received a large attention in recent years and the instance-based ML-kNN classifier is today considered as one of the most efficient. But it is sensitive to noisy and redundant features and its performances decrease with increasing data dimensionality. To overcome these problems, dimensionality reduction is an alternative but current methods optimize reduction objectives which ignore the impact on the ML-kNN classification. We here propose ML-ARP, a novel dimensionality reduction algorithm which, using a variable neighborhood search meta-heuristic, learns a linear projection of the feature space which specifically optimizes the ML-kNN classification loss. Numerical comparisons have confirmed that ML-ARP outperforms ML-kNN without data processing and four standard multi-label dimensionality reduction algorithms

    Latent Fisher Discriminant Analysis

    Full text link
    Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. Previous studies have also extended the binary-class case into multi-classes. However, many applications, such as object detection and keyframe extraction cannot provide consistent instance-label pairs, while LDA requires labels on instance level for training. Thus it cannot be directly applied for semi-supervised classification problem. In this paper, we overcome this limitation and propose a latent variable Fisher discriminant analysis model. We relax the instance-level labeling into bag-level, is a kind of semi-supervised (video-level labels of event type are required for semantic frame extraction) and incorporates a data-driven prior over the latent variables. Hence, our method combines the latent variable inference and dimension reduction in an unified bayesian framework. We test our method on MUSK and Corel data sets and yield competitive results compared to the baseline approach. We also demonstrate its capacity on the challenging TRECVID MED11 dataset for semantic keyframe extraction and conduct a human-factors ranking-based experimental evaluation, which clearly demonstrates our proposed method consistently extracts more semantically meaningful keyframes than challenging baselines.Comment: 12 page

    Visual assessment of multi-photon interference

    Get PDF
    Classical machine learning algorithms can provide insights on high-dimensional processes that are hardly accessible with conventional approaches. As a notable example, t-distributed Stochastic Neighbor Embedding (t-SNE) represents the state of the art for visualization of data sets of large dimensionality. An interesting question is then if this algorithm can provide useful information also in quantum experiments with very large Hilbert spaces. Leveraging these considerations, in this work we apply t-SNE to probe the spatial distribution of n-photon events in m-dimensional Hilbert spaces, showing that its findings can be beneficial for validating genuine quantum interference in boson sampling experiments. In particular, we find that nonlinear dimensionality reduction is capable to capture distinctive features in the spatial distribution of data related to multi-photon states with different evolutions. We envisage that this approach will inspire further theoretical investigations, for instance for a reliable assessment of quantum computational advantage

    ECG biometric authentication based on non-fiducial approach using kernel methods

    Get PDF
    Identity recognition faces several challenges especially in extracting an individual's unique features from biometric modalities and pattern classifications. Electrocardiogram (ECG) waveforms, for instance, have unique identity properties for human recognition, and their signals are not periodic. At present, in order to generate a significant ECG feature set, non-fiducial methodologies based on an autocorrelation (AC) in conjunction with linear dimension reduction methods are used. This paper proposes a new non-fiducial framework for ECG biometric verification using kernel methods to reduce both high autocorrelation vectors' dimensionality and recognition system after denoising signals of 52 subjects with Discrete Wavelet Transform (DWT). The effects of different dimensionality reduction techniques for use in feature extraction were investigated to evaluate verification performance rates of a multi-class Support Vector Machine (SVM) with the One-Against-All (OAA) approach. The experimental results demonstrated higher test recognition rates of Gaussian OAA SVMs on random unknown ECG data sets with the use of the Kernel Principal Component Analysis (KPCA) as compared to the use of the Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA)

    Clustered multidimensional scaling with Rulkov neurons

    Get PDF
    Copyright ©2016 IEICEWhen dealing with high-dimensional measurements that often show non-linear characteristics at multiple scales, a need for unbiased and robust classification and interpretation techniques has emerged. Here, we present a method for mapping high-dimensional data onto low-dimensional spaces, allowing for a fast visual interpretation of the data. Classical approaches of dimensionality reduction attempt to preserve the geometry of the data. They often fail to correctly grasp cluster structures, for instance in high-dimensional situations, where distances between data points tend to become more similar. In order to cope with this clustering problem, we propose to combine classical multi-dimensional scaling with data clustering based on self-organization processes in neural networks, where the goal is to amplify rather than preserve local cluster structures. We find that applying dimensionality reduction techniques to the output of neural network based clustering not only allows for a convenient visual inspection, but also leads to further insights into the intraand inter-cluster connectivity. We report on an implementation of the method with Rulkov-Hebbian-learning clustering and illustrate its suitability in comparison to traditional methods by means of an artificial dataset and a real world example

    GA-based feature subset selection in a spam/non-spam detection system

    Get PDF
    Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes
    corecore