83,305 research outputs found

    Improved feature selection using a hybrid side-blotched lizard algorithm and genetic algorithm approach

    Get PDF
    Feature selection entails choosing the significant features among a wide collection of original features that are essential for predicting test data using a classifier. Feature selection is commonly used in various applications, such as bioinformatics, data mining, and the analysis of written texts, where the dataset contains tens or hundreds of thousands of features, making it difficult to analyze such a large feature set. Removing irrelevant features improves the predictor performance, making it more accurate and cost-effective. In this research, a novel hybrid technique is presented for feature selection that aims to enhance classification accuracy. A hybrid binary version of side-blotched lizard algorithm (SBLA) with genetic algorithm (GA), namely SBLAGA, which combines the strengths of both algorithms is proposed. We use a sigmoid function to adapt the continuous variables values into a binary one, and evaluate our proposed algorithm on twenty-three standard benchmark datasets. Average classification accuracy, average number of selected features and average fitness value were the evaluation criteria. According to the experimental results, SBLAGA demonstrated superior performance compared to SBLA and GA with regards to these criteria. We further compare SBLAGA with four wrapper feature selection methods that are widely used in the literature, and find it to be more efficient

    Penjurusan Siswa SMK menggunakan Kombinasi Algoritma Genetik dan ID3

    Get PDF
    ABSTRAKSI: Keahlian yang dimiliki seseorang akan berbeda dengan orang lain sesuai dengan bakat dan minat serta tingkat kepandaian orang tersebut. Pada dunia pendidikan pun penjurusan diperlukan untuk lebih mengembangkan kemampuan serta bakat yang dimiliki siswa tersebut. Seperti halnya pada SMK PANDAWA BUDI LUHUR, terdapat 2 jurusan untuk setiap siswa. Pada Tugas Akhir ini diimplementasikan suatu metode pengambilan keputusan dalam penjurusan siswa yaitu Algoritma Genetik dan ID3. Algoritma genetik akan dikombinasikan dengan W-KNN (weigth KNN) untuk mendapatkan feature optimal / mata pelajaran yang dianggap paling baik dalam penjurusan di SMK tersebut. Kemudian ID3 akan membangun pohon keputusan dari feature optimal keluaran algoritma genetik. Algoritma genetik memiliki kemampuan yang baik dalam permasalahan optimasi seperti optimasi pemilihan feature. Algoritma k-nearest neighbor (k-NN atau KNN) adalah sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Sedangkan W-KNN adalah perluasan dari KNN dengan menambahkan bobot untuk setiap feature. Prinsip penghitungan matrik ketetanggaan sama dengan KNN. Data latih diproyeksikan ke ruang berdimensi banyak, dimana masing-masing dimensi merepresentasikan fitur dari data. Penggunaan algoritma genetik untuk feature selection dan ID3 memiliki performansi yang lebih baik daripada penggunaan tanpa algoritma genetik untuk feature selection.Berdasarkan penelitian yang telah dilakukan, menunjukan bahwa penggunaan kombinasi algoritma genetik untuk feature selection dengan ID3 menghasilkan akurasi yang lebih baik daripada dengan tidak menggunakan algoritma genetik untuk feature selection.Kata Kunci : Penjurusan SMK,Algoritma Genetik, W-KNN,Feature Selection, ID3.ABSTRACT: Expertise of a person will be different with others according to their talents and interest and skill level of person. In the woeld of education was majorization needed to further develop the skill and talents these students. As with the SMK PANDAWA BUDI LUHUR, there are two major, suc as perhotelan and tour and travel.In this final project implemented a method of decision tree making in the placment of students, namely genetic algorithm and ID3. Genetic algorithm will be combined with W-KNN (weight KNN) to obtain the optimal feature / subject that are considered the best in majorization in SMK PANDAWA BUDI LUHUR. Then ID3 will build a decision tree of genetic algorithm feature optimal output. Genetic algorithm has good ability in optimization problems such as optimation of feature selection. K-nearest neighbor algorithm (k-NN or KNN) is a method for the classification of learning objects based on data that were located closest to the object. While the W-KNN is an extension of KNN by adding weights to each feature. The principle of calculating the same matrix with the KNN neighborhood. Training data projected onto the many-dimensional space, where each dimension represents a feature of the data. The use of genetic algorithms for feature selection and ID3 have a better performance than without the use of genetic algorithms for feature selection.Based on research that has been done, showing that the combined use of genetic algorithms for feature selection with ID3 produces better accuracy than by not using a genetic algorithm for feature selection.Keyword: vacational majorization,genetic algorithm, W-KNN ,Feature Selection, ID3

    A generic optimising feature extraction method using multiobjective genetic programming

    Get PDF
    In this paper, we present a generic, optimising feature extraction method using multiobjective genetic programming. We re-examine the feature extraction problem and show that effective feature extraction can significantly enhance the performance of pattern recognition systems with simple classifiers. A framework is presented to evolve optimised feature extractors that transform an input pattern space into a decision space in which maximal class separability is obtained. We have applied this method to real world datasets from the UCI Machine Learning and StatLog databases to verify our approach and compare our proposed method with other reported results. We conclude that our algorithm is able to produce classifiers of superior (or equivalent) performance to the conventional classifiers examined, suggesting removal of the need to exhaustively evaluate a large family of conventional classifiers on any new problem. (C) 2010 Elsevier B.V. All rights reserved

    An incremental approach to genetic algorithms based classification

    Get PDF
    Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed

    Automating biomedical data science through tree-based pipeline optimization

    Full text link
    Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding
    corecore