205,041 research outputs found

    An Adaptive Firefly Optimization (AFO) with Multi-Kernel SVM (MKSVM) Classification for Big Data Dimensionality Reduction

    Get PDF
    The data's dimensionality had already risen sharply in the last several decades. The "Dimensionality Curse" (DC) is a problem for conventional learning techniques when dealing with "Big Data (BD)" with a higher level of dimensionality. A learning model's performance degrades when there is a numerous range of features present. "Dimensionality Reduction (DR)" approaches are used to solve the DC issue, and the field of "Machine Learning (ML)" research is significant in this regard. It is a prominent procedure to use "Feature Selection (FS)" to reduce dimensions. Improved learning effectiveness such as greater classification precision, cheaper processing costs, and improved model comprehensibility are all typical outcomes of this approach that selects an optimal portion of the original features based on some relevant assessment criteria. An "Adaptive Firefly Optimization (AFO)" technique based on the "Map Reduce (MR)" platform is developed in this research. During the initial phase (mapping stage) the whole large "DataSet (DS)" is first subdivided into blocks of contexts. The AFO technique is then used to choose features from its large DS. In the final phase (reduction stage), every one of the fragmentary findings is combined into a single feature vector. Then the "Multi Kernel Support Vector Machine (MKSVM)" classifier is used as classification in this research to classify the data for appropriate class from the optimal features obtained from AFO for DR purposes. We found that the suggested algorithm AFO combined with MKSVM (AFO-MKSVM) scales very well to high-dimensional DSs which outperforms the existing approach "Linear Discriminant Analysis-Support Vector Machine (LDA-SVM)" in terms of performance. The evaluation metrics such as Information-Ratio for Dimension-Reduction, Accuracy, and Recall, indicate that the AFO-MKSVM method established a better outcome than the LDA-SVM method

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Training Process Reduction Based On Potential Weights Linear Analysis To Accelarate Back Propagation Network

    Get PDF
    Learning is the important property of Back Propagation Network (BPN) and finding the suitable weights and thresholds during training in order to improve training time as well as achieve high accuracy. Currently, data pre-processing such as dimension reduction input values and pre-training are the contributing factors in developing efficient techniques for reducing training time with high accuracy and initialization of the weights is the important issue which is random and creates paradox, and leads to low accuracy with high training time. One good data preprocessing technique for accelerating BPN classification is dimension reduction technique but it has problem of missing data. In this paper, we study current pre-training techniques and new preprocessing technique called Potential Weight Linear Analysis (PWLA) which combines normalization, dimension reduction input values and pre-training. In PWLA, the first data preprocessing is performed for generating normalized input values and then applying them by pre-training technique in order to obtain the potential weights. After these phases, dimension of input values matrix will be reduced by using real potential weights. For experiment results XOR problem and three datasets, which are SPECT Heart, SPECTF Heart and Liver disorders (BUPA) will be evaluated. Our results, however, will show that the new technique of PWLA will change BPN to new Supervised Multi Layer Feed Forward Neural Network (SMFFNN) model with high accuracy in one epoch without training cycle. Also PWLA will be able to have power of non linear supervised and unsupervised dimension reduction property for applying by other supervised multi layer feed forward neural network model in future work.Comment: 11 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS 2009, ISSN 1947 5500, Impact factor 0.42

    Taming Wild High Dimensional Text Data with a Fuzzy Lash

    Full text link
    The bag of words (BOW) represents a corpus in a matrix whose elements are the frequency of words. However, each row in the matrix is a very high-dimensional sparse vector. Dimension reduction (DR) is a popular method to address sparsity and high-dimensionality issues. Among different strategies to develop DR method, Unsupervised Feature Transformation (UFT) is a popular strategy to map all words on a new basis to represent BOW. The recent increase of text data and its challenges imply that DR area still needs new perspectives. Although a wide range of methods based on the UFT strategy has been developed, the fuzzy approach has not been considered for DR based on this strategy. This research investigates the application of fuzzy clustering as a DR method based on the UFT strategy to collapse BOW matrix to provide a lower-dimensional representation of documents instead of the words in a corpus. The quantitative evaluation shows that fuzzy clustering produces superior performance and features to Principal Components Analysis (PCA) and Singular Value Decomposition (SVD), two popular DR methods based on the UFT strategy

    Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data

    Get PDF
    In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrap632+632+and k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tosim96sim 96%correct classification rates with less than 10% of the original features
    corecore