106,112 research outputs found

    PCOS DISEASE CLASSIFICATION USING FEATURE SELECTION RFECV AND EDA WITH KNN ALGORITHM METHOD

    Get PDF
    Polycystic ovary syndrome is an endocrine disorder of the ovaries that causes hormonal disturbances in women of reproductive age, where androgen secretion in the ovaries of women with Polycystic Ovary Syndrome (PCOS) is excessive compared to normal women. This usually occur in women with obesity which is characterized by irregular menstrual cycles, chronic anovulation, hyperandrogenism, and even infertility. Efforts are used to treat this disease in the form of hormone therapy, laparoscopic ovarian drilling, and in-vitro fertilization. However, these three therapies are focused on symptomatic therapy and are less effective in treating PCOS-related infertility. Detecting PCOS disease early is very necessary so that prevention and treatment can be carried out immediately. Therefore, a classification is carried out to detect PCOS disease by being able to analyze data that has a high degree of accuracy. The method used for the classification of PCOS disease is using the K Nearest Neighbor (KNN), method which previously carried out the feature selection process, namely the Exploratory Data Analysis (EDA), method which is used for the data analysis process by means of an analysis approach to data to find out the most accurate method and using the Recursive Feature Elimination and Cross-Validation (RFECV) selection method which ranks the features based on their level of importance to the prediction process. Further, the data classification process uses the K-Nearest Neighbors (KNN) algorithm. The results of the Exploratory Data Analysis (EDA) feature selection process produce 10 data attributes that are used and are continued by the Recursive Feature Elimination and Cross-Validation (RFECV) process by producing the 7 most important attributes used and finally the K-Nearest Neighbors (KNN) method has a high level high accuracy by producing an accuracy value of 93%, precision 82%, recall 100%, and F1 score 90%

    Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization

    Get PDF
    This paper describes the advantages of using Evolutionary Algorithms (EA) for feature selection on network intrusion dataset. Most current Network Intrusion Detection Systems (NIDS) are unable to detect intrusions in real time because of high dimensional data produced during daily operation. Extracting knowledge from huge data such as intrusion data requires new approach. The more complex the datasets, the higher computation time and the harder they are to be interpreted and analyzed. This paper investigates the performance of feature selection algoritms in network intrusiona data. We used Genetic Algorithms (GA) and Particle Swarm Optimizations (PSO) as feature selection algorithms. When applied to network intrusion datasets, both GA and PSO have significantly reduces the number of features. Our experiments show that GA successfully reduces the number of attributes from 41 to 15 while PSO reduces the number of attributes from 41 to 9. Using k Nearest Neighbour (k-NN) as a classifier,the GA-reduced dataset which consists of 37% of original attributes, has accuracy improvement from 99.28% to 99.70% and its execution time is also 4.8 faster than the execution time of original dataset. Using the same classifier, PSO-reduced dataset which consists of 22% of original attributes, has the fastest execution time (7.2 times faster than the execution time of original datasets). However, its accuracy is slightly reduced 0.02% from 99.28% to 99.26%. Overall, both GA and PSO are good solution as feature selection techniques because theyhave shown very good performance in reducing the number of features significantly while still maintaining and sometimes improving the classification accuracy as well as reducing the computation time

    CLASSIFICATION OF KIDNEY DISEASE USING GENETIC MODIFIED KNN AND ARTIFICIAL BEE COLONY ALGORITHM

    Get PDF
    The health care system is currently improving with the development of intelligent artificial systems in detecting diseases. Early detection of kidney disease is essential by recognizing symptoms to prevent more severe damages. This study introduces a classification system for kidney diseases using the Artificial Bee Colony (ABC) algorithm and genetically modified K-Nearest Neighbor (KNN). ABC algorithm is used as a feature selection to determine relevant symptoms used in influencing kidney disease and Genetic modified KNN used for classification. This research consists of 3 stages: pre-processing, feature selection, and classification. However, it focuses on the pre-processing stage of chronic kidney disease using 400 records with 24 attributes for the feature selection and classification. Kidney disease data is classified into two classes, namely chronic kidney disease and not chronic kidney disease. Furthermore, the performance of the proposed method is compared with other methods. The result showed that an accuracy of 98.27% was obtained by dividing the dataset into 280 training and 120 test data

    Face Recognition using the LCS algorithm

    Get PDF
    Today, the topic of human identification based on physical characteristics is a necessity in various fields. As a biometric system, a facial recognition system is fundamentally a pattern recognition system that identifies a person based on specific physiological or behavioral feature vectors. The feature vector is typically stored in a database upon extraction. The main objective of this research is to study and assess the effect of selecting the proper image attributes using the Cuckoo search algorithm. Thus, the selection of an optimal subset, given the large size of the feature vector dimensions to expedite the facial recognition algorithm is essential and substantial. Initially, by using the existing database, image characteristics are extracted and selected as a binary optimal subset of facial features using the Cuckoo algorithm. This subset of optimal features are evaluated by classifying nearest neighbor and neural networks. By calculating the accuracy of this classification, it is clear that the proposed method is of higher accuracy compared to previous methods in facial recognition based on the selection of significant features by the proposed algorithm

    Gait Recognition By Walking and Running: A Model-Based Approach

    No full text
    Gait is an emerging biometric for which some techniques, mainly holistic, have been developed to recognise people by their walking patterns. However, the possibility of recognising people by the way they run remains largely unexplored. The new analytical model presented in this paper is based on the biomechanics of walking and running, and will serve as the foundation of an automatic person recognition system that is invariant to these distinct gaits. A bilateral and dynamically coupled oscillator is the key concept underlying this work. Analysis shows that this new model can be used to automatically describe walking and running subjects without parameter selection. Temporal template matching that takes into account the whole sequence of a gait cycle is applied to extract the angles of thigh and lower leg rotation. The phase-weighted magnitudes of the lower order Fourier components of these rotations form the gait signature. Classification of walking and running subjects is performed using the k-nearest-neighbour classifier. Recognition rates are similar to that achieved by other techniques with a similarly sized database. Future work will investigate feature set selection to improve the recognition rate and will determine the invariance attributes, for inter- and intra- class, of both walking and running

    Feature Selection of Distributed Denial of Service (DDos) IoT Bot Attack Detection Using Machine Learning Techniques

    Get PDF
    Distributed Denial of Service (DDoS) attack can be made through numerous medium and became the one of the biggest threats for computer security. One of the most effective approaches are to develop an algorithm using Machine Learning (ML). However, low accuracy of DDoS because of feature selection classifier and time-consuming detection. This research focusses on the features selection of DDoS IoT bot attack detection using ML techniques. Two datasets from NetFlow which are NF_ToN_IoT and NF_BoT_IoT are manipulated with 2 attributes selection which are Information Gain and Gain Ratio and ranked using Ranker algorithm. These datasets are then tested using four different algorithm such as Naïve Bayes (NB). K-Nearest Neighbor (KNN), Decision Table (DT) and Random Forest (RF). The results then compared using confusion matrix evaluation Accuracy, True Positive, True Negative, Precision and Recall. The result from two datasets is selected by Top 4, Top 8 and Top 12 features selection. The best overall classifier is Naïve Bayes with the accuracy of 97.506% and 90.67% for both dataset NF_ToN_IoT and NF_BoT_IoT.&nbsp

    Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach

    Get PDF
    This paper presents a novel procedure to apply in a sequential way two data preparation techniques from a different nature such as data cleansing and feature selection. For the former we have experienced with a partial removal of outliers via inter-quartile range whereas for the latter we have chosen relevant attributes with two widespread feature subset selectors like CFS (Correlation-based Feature Selection) and CNS (Consistency-based Feature Selection), which are founded on correlation and consistency measures, respectively. Empirical results on seven difficult binary and multi-class data sets, that is, with a test error rate of at least a 10%, according to accuracy, with C4.5 or 1-nearest neighbour classifiers without any kind of prior data pre-processing are outlined. Non-parametric statistical tests assert that the meeting of the aforementioned two data preparation strategies using a correlation measure for feature selection with C4.5 algorithm is significant better, measured with roc measure, than the single application of the data cleansing approach. Last but not least, a weak and not very powerful learner like PART achieved promising results with the new proposal based on a consistency measure and is able to compete with the best configuration of C4.5. To sum up, bearing in mind the new approach, for roc measure PART classifier with a consistency metric behaves slightly better than C4.5 and a correlation measureMICYT TIN2007-68084-C02- 02MICYT TIN2011-28956-C02-02Junta de Andalucía P11-TIC-752

    HASIL CEK SIMILARITY_Klasifikasi Status Stunting Pada Balita Menggunakan K-Nearest Neighbor Dengan Feature Selection Backward Elimination

    Get PDF
    The main problem regarding nutrition faced by Indonesia is stunting, where Indonesia is ranked fifth in the world with the largest stunting prevalence rate in 2017, which is 29.6% of all Indonesian children. Stunting is a child under five years who has a z-score value of less than -3 standard deviations (SD). Stunting has a negative impact, namely it can disrupt the physical and intellectual development of toddlers in the future. In this case, the examination of stunting status by medical personnel is still carried out manually which takes a long time and is prone to inaccuracies. This study aims to classify stunting status in toddlers by applying the K-Nearest Neighbor method using the Backward Elimination feature selection to get fast and accurate results. Based on the results of this study, the average accuracy produced by the K-Nearest Neighbor algorithm at k=5 is 91.90% with 9 attributes and the average accuracy produced by the K-Nearest Neighbor algorithm with the addition of Backward Elimination is 92.20%. with 8 attributes. These results indicate that the application of Backward Elimination can increase the accuracy value of the K-Nearest Neighbor algorithm and also perform attribute selection
    corecore