11 research outputs found

    Optimization of Support Vector Machine Method Using Feature Selection to Improve Classification Results

    Get PDF
    The performance of the organizations or companiesare based on the qualities possessed by their employee. Both of good or bad employee performance will have an impact on productivity and the impact of profits obtained by the company. Support Vector Machine (SVM) is a machine learning method based on statistical learning theory and can solve high non-linearity, regression, etc. In machine learning, the optimization model is a part for improving the accuracy of the model for data learning. Several techniques are used, one of which is feature selection, namely reducing data dimensions so that it can reduce computation in data modeling. This study aims to apply the method of machine learning to the employee data of the Bank Rakyat Indonesia (BRI) company. The method used is SVM method by increasing the accuracy of learning data by using a feature selection technique using a wrapper algorithm. From the results of the classification test, the average accuracy obtained is 72 percent with a precision value of 71 and the recall value is rounded off to 72 percent, with a combination of SVM and cross-validation. Data obtained from Kaggle data, which consists of training data and testing data. each consisting of 30 columns and 22005 rows in the training data and testing data consisting of 29 col-umns and 6000 rows. The results of this study get a classification score of 82 percent. The precision value obtained is rounded off to 82 percent, a recall of 86 percent and an f1-score of 81 percent

    A Hybrid Multi-Filter Wrapper Feature Selection Method for Software Defect Predictors

    Get PDF
    Software Defect Prediction (SDP) is an approach used for identifying defect-prone software modules or components. It helps software engineer to optimally, allocate limited resources to defective software modules or components in the testing or maintenance phases of software development life cycle (SDLC). Nonetheless, the predictive performance of SDP models reckons largely on the quality of dataset utilized for training the predictive models. The high dimensionality of software metric features has been noted as a data quality problem which negatively affects the predictive performance of SDP models. Feature Selection (FS) is a well-known method for solving high dimensionality problem and can be divided into filter-based and wrapper-based methods. Filter-based FS has low computational cost, but the predictive performance of its classification algorithm on the filtered data cannot be guaranteed. On the contrary, wrapper-based FS have good predictive performance but with high computational cost and lack of generalizability. Therefore, this study proposes a hybrid multi-filter wrapper method for feature selection of relevant and irredundant features in software defect prediction. The proposed hybrid feature selection will be developed to take advantage of filter-filter and filter-wrapper relationships to give optimal feature subsets, reduce its evaluation cycle and subsequently improve SDP models overall predictive performance in terms of Accuracy, Precision and Recall values

    Feature Selection for Text and Image Data Using Differential Evolution with SVM and Naïve Bayes Classifiers

    Get PDF
    Classification problems are increasing in various important applications such as text categorization, images, medical imaging diagnosis and bimolecular analysis etc. due to large amount of attribute set. Feature extraction methods in case of large dataset play an important role to reduce the irrelevant feature and thereby increases the performance of classifier algorithm. There exist various methods based on machine learning for text and image classification. These approaches are utilized for dimensionality reduction which aims to filter less informative and outlier data. Therefore, these approaches provide compact representation and computationally better tractable accuracy. At the same time, these methods can be challenging if the search space is doubled multiple time. To optimize such challenges, a hybrid approach is suggested in this paper. The proposed approach uses differential evolution (DE) for feature selection with naïve bayes (NB) and support vector machine (SVM) classifiers to enhance the performance of selected classifier. The results are verified using text and image data which reflects improved accuracy compared with other conventional techniques. A 25 benchmark datasets (UCI) from different domains are considered to test the proposed algorithms.  A comparative study between proposed hybrid classification algorithms are presented in this work. Finally, the experimental result shows that the differential evolution with NB classifier outperforms and produces better estimation of probability terms. The proposed technique in terms of computational time is also feasible

    Improved relative discriminative criterion using rare and informative terms and ringed seal search-support vector machine techniques for text classification

    Get PDF
    Classification has become an important task for automatically classifying the documents to their respective categories. For text classification, feature selection techniques are normally used to identify important features and to remove irrelevant, and noisy features for minimizing the dimensionality of feature space. These techniques are expected particularly to improve efficiency, accuracy, and comprehensibility of the classification models in text labeling problems. Most of the feature selection techniques utilize document and term frequencies to rank a term. Existing feature selection techniques (e.g. RDC, NRDC) consider frequently occurring terms and ignore rarely occurring terms count in a class. However, this study proposes the Improved Relative Discriminative Criterion (IRDC) technique which considers rarely occurring terms count. It is argued that rarely occurring terms count are also meaningful and important as frequently occurring terms in a class. The proposed IRDC is compared to the most recent feature selection techniques RDC and NRDC. The results reveal significant improvement by the proposed IRDC technique for feature selection in terms of precision 27%, recall 30%, macro-average 35% and micro- average 30%. Additionally, this study also proposes a hybrid algorithm named: Ringed Seal Search-Support Vector Machine (RSS-SVM) to improve the generalization and learning capability of the SVM. The proposed RSS-SVM optimizes kernel and penalty parameter with the help of RSS algorithm. The proposed RSS-SVM is compared to the most recent techniques GA-SVM and CS-SVM. The results show significant improvement by the proposed RSS-SVM for classification in terms of accuracy 18.8%, recall 15.68%, precision 15.62% and specificity 13.69%. In conclusion, the proposed IRDC has shown better performance as compare to existing techniques because its capability in considering rare and informative terms. Additionally, the proposed RSS- SVM has shown better performance as compare to existing techniques because it has capability to improve balance between exploration and exploitation

    Multinomial logistic regression probability ratio-based feature vectors for Malay vowel recognition

    Get PDF
    Vowel Recognition is a part of automatic speech recognition (ASR) systems that classifies speech signals into groups of vowels. The performance of Malay vowel recognition (MVR) like any multiclass classification problem depends largely on Feature Vectors (FVs). FVs such as Mel-frequency Cepstral Coefficients (MFCC) have produced high error rates due to poor phoneme information. Classifier transformed probabilistic features have proved a better alternative in conveying phoneme information. However, the high dimensionality of the probabilistic features introduces additional complexity that deteriorates ASR performance. This study aims to improve MVR performance by proposing an algorithm that transforms MFCC FVs into a new set of features using Multinomial Logistic Regression (MLR) to reduce the dimensionality of the probabilistic features. This study was carried out in four phases which are pre-processing and feature extraction, best regression coefficients generation, feature transformation, and performance evaluation. The speech corpus consists of 1953 samples of five Malay vowels of /a/, /e/, /i/, /o/ and /u/ recorded from students of two public universities in Malaysia. Two sets of algorithms were developed which are DBRCs and FELT. DBRCs algorithm determines the best regression coefficients (DBRCs) to obtain the best set of regression coefficients (RCs) from the extracted 39-MFCC FVs through resampling and data swapping approach. FELT algorithm transforms 39-MFCC FVs using logistic transformation method into FELT FVs. Vowel recognition rates of FELT and 39-MFCC FVs were compared using four different classification techniques of Artificial Neural Network, MLR, Linear Discriminant Analysis, and k-Nearest Neighbour. Classification results showed that FELT FVs surpass the performance of 39-MFCC FVs in MVR. Depending on the classifiers used, the improved performance of 1.48% - 11.70% was attained by FELT over MFCC. Furthermore, FELT significantly improved the recognition accuracy of vowels /o/ and /u/ by 5.13% and 8.04% respectively. This study contributes two algorithms for determining the best set of RCs and generating FELT FVs from MFCC. The FELT FVs eliminate the need for dimensionality reduction with comparable performances. Furthermore, FELT FVs improved MVR for all the five vowels especially /o/ and /u/. The improved MVR performance will spur the development of Malay speech-based systems, especially for the Malaysian community

    Cooperative co-evolution for feature selection in big data with random feature grouping

    Get PDF
    © 2020, The Author(s). A massive amount of data is generated with the evolution of modern technologies. This high-throughput data generation results in Big Data, which consist of many features (attributes). However, irrelevant features may degrade the classification performance of machine learning (ML) algorithms. Feature selection (FS) is a technique used to select a subset of relevant features that represent the dataset. Evolutionary algorithms (EAs) are widely used search strategies in this domain. A variant of EAs, called cooperative co-evolution (CC), which uses a divide-and-conquer approach, is a good choice for optimization problems. The existing solutions have poor performance because of some limitations, such as not considering feature interactions, dealing with only an even number of features, and decomposing the dataset statically. In this paper, a novel random feature grouping (RFG) has been introduced with its three variants to dynamically decompose Big Data datasets and to ensure the probability of grouping interacting features into the same subcomponent. RFG can be used in CC-based FS processes, hence called Cooperative Co-Evolutionary-Based Feature Selection with Random Feature Grouping (CCFSRFG). Experiment analysis was performed using six widely used ML classifiers on seven different datasets from the UCI ML repository and Princeton University Genomics repository with and without FS. The experimental results indicate that in most cases [i.e., with naïve Bayes (NB), support vector machine (SVM), k-Nearest Neighbor (k-NN), J48, and random forest (RF)] the proposed CCFSRFG-1 outperforms an existing solution (a CC-based FS, called CCEAFS) and CCFSRFG-2, and also when using all features in terms of accuracy, sensitivity, and specificity

    Differential evolution for filter feature selection based on information theory and feature ranking

    No full text
    © 2017 Elsevier B.V. Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy. © This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0

    Differential evolution for filter feature selection based on information theory and feature ranking

    No full text
    © 2017 Elsevier B.V. Feature selection is an essential step in various tasks, where filter feature selection algorithms are increasingly attractive due to their simplicity and fast speed. A common filter is to use mutual information to estimate the relationships between each feature and the class labels (mutual relevancy), and between each pair of features (mutual redundancy). This strategy has gained popularity resulting a variety of criteria based on mutual information. Other well-known strategies are to order each feature based on the nearest neighbor distance as in ReliefF, and based on the between-class variance and the within-class variance as in Fisher Score. However, each strategy comes with its own advantages and disadvantages. This paper proposes a new filter criterion inspired by the concepts of mutual information, ReliefF and Fisher Score. Instead of using mutual redundancy, the proposed criterion tries to choose the highest ranked features determined by ReliefF and Fisher Score while providing the mutual relevance between features and the class labels. Based on the proposed criterion, two new differential evolution (DE) based filter approaches are developed. While the former uses the proposed criterion as a single objective problem in a weighted manner, the latter considers the proposed criterion in a multi-objective design. Moreover, a well known mutual information feature selection approach (MIFS) based on maximum-relevance and minimum-redundancy is also adopted in single-objective and multi-objective DE algorithms for feature selection. The results show that the proposed criterion outperforms MIFS in both single objective and multi-objective DE frameworks. The results also indicate that considering feature selection as a multi-objective problem can generally provide better performance in terms of the feature subset size and the classification accuracy. © This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-nc-nd/4.0
    corecore