3,291 research outputs found
Recommended from our members
Learning distance to subspace for the nearest subspace methods in high-dimensional data classification
The nearest subspace methods (NSM) are a category of classification methods widely applied to classify high-dimensional data. In this paper, we propose to improve the classification performance of NSM through learning tailored distance metrics from samples to class subspaces. The learned distance metric is termed as ‘learned distance to subspace’ (LD2S). Using LD2S in the classification rule of NSM can make the samples closer to their correct class subspaces while farther away from their wrong class subspaces. In this way, the classification task becomes easier and the classification performance of NSM can be improved. The superior classification performance of using LD2S for NSM is demonstrated on three real-world high-dimensional spectral datasets
Fast DD-classification of functional data
A fast nonparametric procedure for classifying functional data is introduced.
It consists of a two-step transformation of the original data plus a classifier
operating on a low-dimensional hypercube. The functional data are first mapped
into a finite-dimensional location-slope space and then transformed by a
multivariate depth function into the -plot, which is a subset of the unit
hypercube. This transformation yields a new notion of depth for functional
data. Three alternative depth functions are employed for this, as well as two
rules for the final classification on . The resulting classifier has
to be cross-validated over a small range of parameters only, which is
restricted by a Vapnik-Cervonenkis bound. The entire methodology does not
involve smoothing techniques, is completely nonparametric and allows to achieve
Bayes optimality under standard distributional settings. It is robust,
efficiently computable, and has been implemented in an R environment.
Applicability of the new approach is demonstrated by simulations as well as a
benchmark study
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows
acqui- sition of high dimensional datasets from which valuable information may
be extracted using multivariate statistical analyses, such as dimensionality
reduction and automatic classification (supervised and unsupervised). In this
work, a supervised classification through a partial least squares discriminant
analysis (PLS-DA) is performed on the hy- perspectral data. The obtained
results are compared with those obtained by the most commonly used
classification approaches
Partial least squares discriminant analysis: A dimensionality reduction method to classify hyperspectral data
The recent development of more sophisticated spectroscopic methods allows acquisition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction and automatic classification (supervised and unsupervised). In this work, a supervised classification through a partial least squares discriminant analysis (PLS-DA) is performed on the hy- perspectral data. The obtained results are compared with those obtained by the most commonly used classification approaches
Comparing the writing style of real and artificial papers
Recent years have witnessed the increase of competition in science. While
promoting the quality of research in many cases, an intense competition among
scientists can also trigger unethical scientific behaviors. To increase the
total number of published papers, some authors even resort to software tools
that are able to produce grammatical, but meaningless scientific manuscripts.
Because automatically generated papers can be misunderstood as real papers, it
becomes of paramount importance to develop means to identify these scientific
frauds. In this paper, I devise a methodology to distinguish real manuscripts
from those generated with SCIGen, an automatic paper generator. Upon modeling
texts as complex networks (CN), it was possible to discriminate real from fake
papers with at least 89\% of accuracy. A systematic analysis of features
relevance revealed that the accessibility and betweenness were useful in
particular cases, even though the relevance depended upon the dataset. The
successful application of the methods described here show, as a proof of
principle, that network features can be used to identify scientific gibberish
papers. In addition, the CN-based approach can be combined in a straightforward
fashion with traditional statistical language processing methods to improve the
performance in identifying artificially generated papers.Comment: To appear in Scientometrics (2015
Sketch-based subspace clustering of hyperspectral images
Sparse subspace clustering (SSC) techniques provide the state-of-the-art in clustering of hyperspectral images (HSIs). However, their computational complexity hinders their applicability to large-scale HSIs. In this paper, we propose a large-scale SSC-based method, which can effectively process large HSIs while also achieving improved clustering accuracy compared to the current SSC methods. We build our approach based on an emerging concept of sketched subspace clustering, which was to our knowledge not explored at all in hyperspectral imaging yet. Moreover, there are only scarce results on any large-scale SSC approaches for HSI. We show that a direct application of sketched SSC does not provide a satisfactory performance on HSIs but it does provide an excellent basis for an effective and elegant method that we build by extending this approach with a spatial prior and deriving the corresponding solver. In particular, a random matrix constructed by the Johnson-Lindenstrauss transform is first used to sketch the self-representation dictionary as a compact dictionary, which significantly reduces the number of sparse coefficients to be solved, thereby reducing the overall complexity. In order to alleviate the effect of noise and within-class spectral variations of HSIs, we employ a total variation constraint on the coefficient matrix, which accounts for the spatial dependencies among the neighbouring pixels. We derive an efficient solver for the resulting optimization problem, and we theoretically prove its convergence property under mild conditions. The experimental results on real HSIs show a notable improvement in comparison with the traditional SSC-based methods and the state-of-the-art methods for clustering of large-scale images
A Detailed Investigation into Low-Level Feature Detection in Spectrogram Images
Being the first stage of analysis within an image, low-level feature detection is a crucial step in the image analysis process and, as such, deserves suitable attention. This paper presents a systematic investigation into low-level feature detection in spectrogram images. The result of which is the identification of frequency tracks. Analysis of the literature identifies different strategies for accomplishing low-level feature detection. Nevertheless, the advantages and disadvantages of each are not explicitly investigated. Three model-based detection strategies are outlined, each extracting an increasing amount of information from the spectrogram, and, through ROC analysis, it is shown that at increasing levels of extraction the detection rates increase. Nevertheless, further investigation suggests that model-based detection has a limitation—it is not computationally feasible to fully evaluate the model of even a simple sinusoidal track. Therefore, alternative approaches, such as dimensionality reduction, are investigated to reduce the complex search space. It is shown that, if carefully selected, these techniques can approach the detection rates of model-based strategies that perform the same level of information extraction. The implementations used to derive the results presented within this paper are available online from http://stdetect.googlecode.com
- …