82,618 research outputs found
Supervised Feature Space Reduction for Multi-Label Nearest Neighbors
International audienceWith the ability to process many real-world problems, multi-label classification has received a large attention in recent years and the instance-based ML-kNN classifier is today considered as one of the most efficient. But it is sensitive to noisy and redundant features and its performances decrease with increasing data dimensionality. To overcome these problems, dimensionality reduction is an alternative but current methods optimize reduction objectives which ignore the impact on the ML-kNN classification. We here propose ML-ARP, a novel dimensionality reduction algorithm which, using a variable neighborhood search meta-heuristic, learns a linear projection of the feature space which specifically optimizes the ML-kNN classification loss. Numerical comparisons have confirmed that ML-ARP outperforms ML-kNN without data processing and four standard multi-label dimensionality reduction algorithms
Latent Fisher Discriminant Analysis
Linear Discriminant Analysis (LDA) is a well-known method for dimensionality
reduction and classification. Previous studies have also extended the
binary-class case into multi-classes. However, many applications, such as
object detection and keyframe extraction cannot provide consistent
instance-label pairs, while LDA requires labels on instance level for training.
Thus it cannot be directly applied for semi-supervised classification problem.
In this paper, we overcome this limitation and propose a latent variable Fisher
discriminant analysis model. We relax the instance-level labeling into
bag-level, is a kind of semi-supervised (video-level labels of event type are
required for semantic frame extraction) and incorporates a data-driven prior
over the latent variables. Hence, our method combines the latent variable
inference and dimension reduction in an unified bayesian framework. We test our
method on MUSK and Corel data sets and yield competitive results compared to
the baseline approach. We also demonstrate its capacity on the challenging
TRECVID MED11 dataset for semantic keyframe extraction and conduct a
human-factors ranking-based experimental evaluation, which clearly demonstrates
our proposed method consistently extracts more semantically meaningful
keyframes than challenging baselines.Comment: 12 page
Visual assessment of multi-photon interference
Classical machine learning algorithms can provide insights on high-dimensional processes that are hardly accessible with conventional approaches. As a notable example, t-distributed Stochastic Neighbor Embedding (t-SNE) represents the state of the art for visualization of data sets of large dimensionality. An interesting question is then if this algorithm can provide useful information also in quantum experiments with very large Hilbert spaces. Leveraging these considerations, in this work we apply t-SNE to probe the spatial distribution of n-photon events in m-dimensional Hilbert spaces, showing that its findings can be beneficial for validating genuine quantum interference in boson sampling experiments. In particular, we find that nonlinear dimensionality reduction is capable to capture distinctive features in the spatial distribution of data related to multi-photon states with different evolutions. We envisage that this approach will inspire further theoretical investigations, for instance for a reliable assessment of quantum computational advantage
ECG biometric authentication based on non-fiducial approach using kernel methods
Identity recognition faces several challenges especially in extracting an individual's unique features from biometric modalities and pattern classifications. Electrocardiogram (ECG) waveforms, for instance, have unique identity properties for human recognition, and their signals are not periodic. At present, in order to generate a significant ECG feature set, non-fiducial methodologies based on an autocorrelation (AC) in conjunction with linear dimension reduction methods are used. This paper proposes a new non-fiducial framework for ECG biometric verification using kernel methods to reduce both high autocorrelation vectors' dimensionality and recognition system after denoising signals of 52 subjects with Discrete Wavelet Transform (DWT). The effects of different dimensionality reduction techniques for use in feature extraction were investigated to evaluate verification performance rates of a multi-class Support Vector Machine (SVM) with the One-Against-All (OAA) approach. The experimental results demonstrated higher test recognition rates of Gaussian OAA SVMs on random unknown ECG data sets with the use of the Kernel Principal Component Analysis (KPCA) as compared to the use of the Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA)
Clustered multidimensional scaling with Rulkov neurons
Copyright ©2016 IEICEWhen dealing with high-dimensional measurements that often show non-linear characteristics at multiple scales, a need for unbiased and robust classification and interpretation techniques has emerged. Here, we present a method for mapping high-dimensional data onto low-dimensional spaces, allowing for a fast visual interpretation of the data. Classical approaches of dimensionality reduction attempt to preserve the geometry of the data.
They often fail to correctly grasp cluster structures, for instance in high-dimensional situations, where distances between data points tend to become more similar. In order to cope with this clustering problem, we propose to combine classical multi-dimensional scaling with data clustering based on self-organization processes in neural networks, where the goal is to amplify rather than preserve local cluster structures. We find that applying dimensionality reduction techniques to the output of neural network based clustering not only allows for a convenient visual inspection, but also leads to further insights into the intraand inter-cluster connectivity. We report on an implementation of the method with Rulkov-Hebbian-learning clustering and illustrate its suitability in comparison to traditional methods by means of an artificial dataset and a real world example
GA-based feature subset selection in a spam/non-spam detection system
Spam has created a significant security problem for computer users everywhere. Spammers take an advantage of defrauds to cover parts of messages that can be used for identification of spam. For instance, a spammer does not need to consume much cost and bandwidth for sending junk mails even more than one hundred emails. On the other hand, from the feature selection perspective, one of the specific problems that decrease accuracy of spam and non-spam emails classification is high data dimensionality. Therefore, the reduction of dimensionality is related to decrease the number of irrelevant features. In this paper, a genetic algorithm (GA) is applied during feature selection in effort to decrease the number of useless features in a collection of high-dimensional email body and subject. Next, a Multi-Layer Perceptron (MLP) is employed to classify features that have been selected by the GA. Using LingSpam benchmark corpora as the dataset, the experimental results showed that a GA feature selector with the MLP classifier does not only decrease the data dimensionality but increase the spam detection rate as compared against other classifiers such as SVM and Naïve Bayes
- …