24,135 research outputs found
Web news classification using neural networks based on PCA
In this paper, we propose a news web page classification method (WPCM). The WPCM uses a neural network with inputs obtained by both the principal components and class profile-based features (CPBF). The fixed number of regular words from each class will be used as a feature vectors with the reduced features from the PCA. These feature vectors are then used as the input to the neural networks for classification. The experimental evaluation demonstrates that the WPCM provides acceptable classification accuracy with the sports news datasets
A study of the classification of low-dimensional data with supervised manifold learning
Supervised manifold learning methods learn data representations by preserving
the geometric structure of data while enhancing the separation between data
samples from different classes. In this work, we propose a theoretical study of
supervised manifold learning for classification. We consider nonlinear
dimensionality reduction algorithms that yield linearly separable embeddings of
training data and present generalization bounds for this type of algorithms. A
necessary condition for satisfactory generalization performance is that the
embedding allow the construction of a sufficiently regular interpolation
function in relation with the separation margin of the embedding. We show that
for supervised embeddings satisfying this condition, the classification error
decays at an exponential rate with the number of training samples. Finally, we
examine the separability of supervised nonlinear embeddings that aim to
preserve the low-dimensional geometric structure of data based on graph
representations. The proposed analysis is supported by experiments on several
real data sets
Non-Standard Words as Features for Text Categorization
This paper presents categorization of Croatian texts using Non-Standard Words
(NSW) as features. Non-Standard Words are: numbers, dates, acronyms,
abbreviations, currency, etc. NSWs in Croatian language are determined
according to Croatian NSW taxonomy. For the purpose of this research, 390 text
documents were collected and formed the SKIPEZ collection with 6 classes:
official, literary, informative, popular, educational and scientific. Text
categorization experiment was conducted on three different representations of
the SKIPEZ collection: in the first representation, the frequencies of NSWs are
used as features; in the second representation, the statistic measures of NSWs
(variance, coefficient of variation, standard deviation, etc.) are used as
features; while the third representation combines the first two feature sets.
Naive Bayes, CN2, C4.5, kNN, Classification Trees and Random Forest algorithms
were used in text categorization experiments. The best categorization results
are achieved using the first feature set (NSW frequencies) with the
categorization accuracy of 87%. This suggests that the NSWs should be
considered as features in highly inflectional languages, such as Croatian. NSW
based features reduce the dimensionality of the feature space without standard
lemmatization procedures, and therefore the bag-of-NSWs should be considered
for further Croatian texts categorization experiments.Comment: IEEE 37th International Convention on Information and Communication
Technology, Electronics and Microelectronics (MIPRO 2014), pp. 1415-1419,
201
Vibration-based adaptive novelty detection method for monitoring faults in a kinematic chain
Postprint (published version
- …