205,041 research outputs found
An Adaptive Firefly Optimization (AFO) with Multi-Kernel SVM (MKSVM) Classification for Big Data Dimensionality Reduction
The data's dimensionality had already risen sharply in the last several decades. The "Dimensionality Curse" (DC) is a problem for conventional learning techniques when dealing with "Big Data (BD)" with a higher level of dimensionality. A learning model's performance degrades when there is a numerous range of features present. "Dimensionality Reduction (DR)" approaches are used to solve the DC issue, and the field of "Machine Learning (ML)" research is significant in this regard. It is a prominent procedure to use "Feature Selection (FS)" to reduce dimensions. Improved learning effectiveness such as greater classification precision, cheaper processing costs, and improved model comprehensibility are all typical outcomes of this approach that selects an optimal portion of the original features based on some relevant assessment criteria. An "Adaptive Firefly Optimization (AFO)" technique based on the "Map Reduce (MR)" platform is developed in this research. During the initial phase (mapping stage) the whole large "DataSet (DS)" is first subdivided into blocks of contexts. The AFO technique is then used to choose features from its large DS. In the final phase (reduction stage), every one of the fragmentary findings is combined into a single feature vector. Then the "Multi Kernel Support Vector Machine (MKSVM)" classifier is used as classification in this research to classify the data for appropriate class from the optimal features obtained from AFO for DR purposes. We found that the suggested algorithm AFO combined with MKSVM (AFO-MKSVM) scales very well to high-dimensional DSs which outperforms the existing approach "Linear Discriminant Analysis-Support Vector Machine (LDA-SVM)" in terms of performance. The evaluation metrics such as Information-Ratio for Dimension-Reduction, Accuracy, and Recall, indicate that the AFO-MKSVM method established a better outcome than the LDA-SVM method
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Training Process Reduction Based On Potential Weights Linear Analysis To Accelarate Back Propagation Network
Learning is the important property of Back Propagation Network (BPN) and
finding the suitable weights and thresholds during training in order to improve
training time as well as achieve high accuracy. Currently, data pre-processing
such as dimension reduction input values and pre-training are the contributing
factors in developing efficient techniques for reducing training time with high
accuracy and initialization of the weights is the important issue which is
random and creates paradox, and leads to low accuracy with high training time.
One good data preprocessing technique for accelerating BPN classification is
dimension reduction technique but it has problem of missing data. In this
paper, we study current pre-training techniques and new preprocessing technique
called Potential Weight Linear Analysis (PWLA) which combines normalization,
dimension reduction input values and pre-training. In PWLA, the first data
preprocessing is performed for generating normalized input values and then
applying them by pre-training technique in order to obtain the potential
weights. After these phases, dimension of input values matrix will be reduced
by using real potential weights. For experiment results XOR problem and three
datasets, which are SPECT Heart, SPECTF Heart and Liver disorders (BUPA) will
be evaluated. Our results, however, will show that the new technique of PWLA
will change BPN to new Supervised Multi Layer Feed Forward Neural Network
(SMFFNN) model with high accuracy in one epoch without training cycle. Also
PWLA will be able to have power of non linear supervised and unsupervised
dimension reduction property for applying by other supervised multi layer feed
forward neural network model in future work.Comment: 11 pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS 2009, ISSN 1947 5500, Impact factor 0.42
Taming Wild High Dimensional Text Data with a Fuzzy Lash
The bag of words (BOW) represents a corpus in a matrix whose elements are the
frequency of words. However, each row in the matrix is a very high-dimensional
sparse vector. Dimension reduction (DR) is a popular method to address sparsity
and high-dimensionality issues. Among different strategies to develop DR
method, Unsupervised Feature Transformation (UFT) is a popular strategy to map
all words on a new basis to represent BOW. The recent increase of text data and
its challenges imply that DR area still needs new perspectives. Although a wide
range of methods based on the UFT strategy has been developed, the fuzzy
approach has not been considered for DR based on this strategy. This research
investigates the application of fuzzy clustering as a DR method based on the
UFT strategy to collapse BOW matrix to provide a lower-dimensional
representation of documents instead of the words in a corpus. The quantitative
evaluation shows that fuzzy clustering produces superior performance and
features to Principal Components Analysis (PCA) and Singular Value
Decomposition (SVD), two popular DR methods based on the UFT strategy
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
- …