Search CORE

83,305 research outputs found

Improved feature selection using a hybrid side-blotched lizard algorithm and genetic algorithm approach

Author: Abdel-aal Amr
El-Henawy Ibrahim
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2023
Field of study

Feature selection entails choosing the significant features among a wide collection of original features that are essential for predicting test data using a classifier. Feature selection is commonly used in various applications, such as bioinformatics, data mining, and the analysis of written texts, where the dataset contains tens or hundreds of thousands of features, making it difficult to analyze such a large feature set. Removing irrelevant features improves the predictor performance, making it more accurate and cost-effective. In this research, a novel hybrid technique is presented for feature selection that aims to enhance classification accuracy. A hybrid binary version of side-blotched lizard algorithm (SBLA) with genetic algorithm (GA), namely SBLAGA, which combines the strengths of both algorithms is proposed. We use a sigmoid function to adapt the continuous variables values into a binary one, and evaluate our proposed algorithm on twenty-three standard benchmark datasets. Average classification accuracy, average number of selected features and average fitness value were the evaluation criteria. According to the experimental results, SBLAGA demonstrated superior performance compared to SBLA and GA with regards to these criteria. We further compare SBLAGA with four wrapper feature selection methods that are widely used in the literature, and find it to be more efficient

Institute of Advanced Engineering and Science

Penjurusan Siswa SMK menggunakan Kombinasi Algoritma Genetik dan ID3

Author: Rizqi Ade Putri
Publication venue: Universitas Telkom
Publication date: 20/12/2012
Field of study

ABSTRAKSI: Keahlian yang dimiliki seseorang akan berbeda dengan orang lain sesuai dengan bakat dan minat serta tingkat kepandaian orang tersebut. Pada dunia pendidikan pun penjurusan diperlukan untuk lebih mengembangkan kemampuan serta bakat yang dimiliki siswa tersebut. Seperti halnya pada SMK PANDAWA BUDI LUHUR, terdapat 2 jurusan untuk setiap siswa. Pada Tugas Akhir ini diimplementasikan suatu metode pengambilan keputusan dalam penjurusan siswa yaitu Algoritma Genetik dan ID3. Algoritma genetik akan dikombinasikan dengan W-KNN (weigth KNN) untuk mendapatkan feature optimal / mata pelajaran yang dianggap paling baik dalam penjurusan di SMK tersebut. Kemudian ID3 akan membangun pohon keputusan dari feature optimal keluaran algoritma genetik. Algoritma genetik memiliki kemampuan yang baik dalam permasalahan optimasi seperti optimasi pemilihan feature. Algoritma k-nearest neighbor (k-NN atau KNN) adalah sebuah metode untuk melakukan klasifikasi terhadap objek berdasarkan data pembelajaran yang jaraknya paling dekat dengan objek tersebut. Sedangkan W-KNN adalah perluasan dari KNN dengan menambahkan bobot untuk setiap feature. Prinsip penghitungan matrik ketetanggaan sama dengan KNN. Data latih diproyeksikan ke ruang berdimensi banyak, dimana masing-masing dimensi merepresentasikan fitur dari data. Penggunaan algoritma genetik untuk feature selection dan ID3 memiliki performansi yang lebih baik daripada penggunaan tanpa algoritma genetik untuk feature selection.Berdasarkan penelitian yang telah dilakukan, menunjukan bahwa penggunaan kombinasi algoritma genetik untuk feature selection dengan ID3 menghasilkan akurasi yang lebih baik daripada dengan tidak menggunakan algoritma genetik untuk feature selection.Kata Kunci : Penjurusan SMK,Algoritma Genetik, W-KNN,Feature Selection, ID3.ABSTRACT: Expertise of a person will be different with others according to their talents and interest and skill level of person. In the woeld of education was majorization needed to further develop the skill and talents these students. As with the SMK PANDAWA BUDI LUHUR, there are two major, suc as perhotelan and tour and travel.In this final project implemented a method of decision tree making in the placment of students, namely genetic algorithm and ID3. Genetic algorithm will be combined with W-KNN (weight KNN) to obtain the optimal feature / subject that are considered the best in majorization in SMK PANDAWA BUDI LUHUR. Then ID3 will build a decision tree of genetic algorithm feature optimal output. Genetic algorithm has good ability in optimization problems such as optimation of feature selection. K-nearest neighbor algorithm (k-NN or KNN) is a method for the classification of learning objects based on data that were located closest to the object. While the W-KNN is an extension of KNN by adding weights to each feature. The principle of calculating the same matrix with the KNN neighborhood. Training data projected onto the many-dimensional space, where each dimension represents a feature of the data. The use of genetic algorithms for feature selection and ID3 have a better performance than without the use of genetic algorithms for feature selection.Based on research that has been done, showing that the combined use of genetic algorithms for feature selection with ID3 produces better accuracy than by not using a genetic algorithm for feature selection.Keyword: vacational majorization,genetic algorithm, W-KNN ,Feature Selection, ID3

Open Library

A generic optimising feature extraction method using multiobjective genetic programming

Author: Rockett P.I
Zhang Y.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we present a generic, optimising feature extraction method using multiobjective genetic programming. We re-examine the feature extraction problem and show that effective feature extraction can significantly enhance the performance of pattern recognition systems with simple classifiers. A framework is presented to evolve optimised feature extractors that transform an input pattern space into a decision space in which maximal class separability is obtained. We have applied this method to real world datasets from the UCI Machine Learning and StatLog databases to verify our approach and compare our proposed method with other reported results. We conclude that our algorithm is able to produce classifiers of superior (or equivalent) performance to the conventional classifiers examined, suggesting removal of the need to exhaustively evaluate a large family of conventional classifiers on any new problem. (C) 2010 Elsevier B.V. All rights reserved

White Rose Research Online

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive

An incremental approach to genetic algorithms based classification

Author: Guan SU
Zhu F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multi-agent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an “integration” operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed

CiteSeerX

Crossref

Brunel University Research Archive

ScholarBank@NUS

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia