82,477 research outputs found

    Full model selection in the space of data mining operators

    Get PDF
    We propose a framework and a novel algorithm for the full model selection (FMS) problem. The proposed algorithm, combining both genetic algorithms (GA) and particle swarm optimization (PSO), is named GPS (which stands for GAPSO-FMS), in which a GA is used for searching the optimal structure of a data mining solution, and PSO is used for searching the optimal parameter set for a particular structure instance. Given a classification or regression problem, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are applicable to the problem, including data cleansing, data sampling, feature transformation/selection and algorithm operators. The solution can also be represented graphically in a human readable form. Experimental results demonstrate the benefit of the algorithm

    Instances Selection Using Advance Data Mining Techniques”,

    Get PDF
    ABSTRACT Genetic algorithms (GA) are optimization techniques inspired from natural evolution processes. They handle a population of individuals that evolve with the help of information exchange procedures. In this paper we proposed genetic algorithms (GA) approach to optimize of connection weights and instance selection for artificial neural networks (ANNs) to predict the stock price index. ANN has preeminent learning ability, but often exhibit inconsistent and unpredictable performance for noisy data. In this paper GA is employed not only to improve the learning algorithm, but also to reduce the complexity in feature space. GA optimizes simultaneously the connection weights between layers and a selection of relevant instances. This study applies the proposed model to India Cements Stock Price Index (ICSPI) analysis. Experimental results show that the GA approach is a promising method for instance selection and optimize the connection weight between layers

    Analisis dan Implementasi Genetic Algorithm-Sequential Ensemble Feature Selection (GA-SEFS) untuk Ensemble Feature Selection

    Get PDF
    ABSTRAKSI: Sebuah data yang ada saat ini bisa memiliki feature yang banyak. Banyaknya feature yang bisa dimiliki oleh satu objek instance belum tentu merupakan informasi relevan yang dibutuhkan oleh sistem data mining. Feature selection adalah suatu proses memilih subset dari feature/atribut yang relevan dengan menggunakan kriteria tertentu. Dengan melakukan feature selection ini mampu untuk mengurangi jumlah feature yang tidak relevan, menghilangkan redundansi data, dan meningkatkan akurasi learning.Klasifikasi merupakan salah satu tahapan dalam data mining, yang fungsinya adalah untuk memprediksi keanggotaan atau kelas dari suatu data. Dalam beberapa studi ditunjukkan bahwa sebuah ensemble (himpunan) dari beberapa classifier umumnya lebih akurat dari classifier tunggal. Salah satu cara untuk menghasilkan sebuah ensemble adalah dengan memilih beberapa feature subset yang berbeda dari dataset asli dan untuk setiap feature subset tersebut selanjutnya dilakukan klasifikasi. Pendekatan ini dikenal sebagai ensemble feature selection. Di sini, penulis akan mencoba mengimplementasikan genetic algorithm untuk optimasi feature selection dalam pembentukan ensemble, yaitu Genetic Algorithm-Sequential Ensemble Feature Selection (GA-SEFS). Algoritma feature selection yang konvensional bertujuan untuk menemukan feature subset terbaik, sedangkan ensemble feature selection mempunyai tujuan untuk menemukan himpunan feature subset terbaik yang dapat meningkatkan akurasi dalam klasifikasi.Dalam GA-SEFS terdapat 6 parameter penting. Parameter ukuran populasi, jumlah generasi, dan offspring tidak berpengaruh secara langsung terhadap akurasi yang dihasilkan dari klasifikasi ensemble. Parameter ukuran ensemble dapat membantu peningkatan akurasi dikarenakan vote feature subset beragam yang mampu membantu meningkatkan akurasi. Parameter alpha dapat membantu memberikan peningkatan akurasi tinggi yang didapat oleh kombinasi 4 parameter diatas (ukuran ensemble, jumlah populasi, jumlah generasi, dan jumlah offspring). Parameter beta dalam percobaan Tugas Akhir ini untuk tiga dataset berbeda ternyata lebih memberikan nilai akurasi yang tinggi pada nilai beta negatif.Kata Kunci : feauret subset selection, ensemble, genetic searchABSTRACT: A current data can have a lot of features. The number of features that can be owned by a single object instance is not necessarily the relevant information required by the data mining system. Feature selection is a process of selecting a subset of features / attributes that are relevant to using certain criteria. By doing feature selection is able to reduce the number of irrelevant features, eliminating data redundancy, and improve the accuracy of learning.Classification is one of the stages in data mining, whose function is to predict membership or classes of data. In some studies indicated that an ensemble (set) of some of the classifier is generally more accurate than a single classifier. One way to generate an ensemble is to choose several different subset of features from the original dataset and for each feature subset is then performed classification. This approach is known as ensemble feature selection. Here, the author will try to implement a genetic algorithm for optimization of feature selection in the formation of ensembles, namely Genetic Algorithm-Sequential Ensemble Feature Selection (GA-SEFS). Conventional feature selection algorithms aim to find the best feature subset, while the ensemble feature selection has the objective to find the best subset of the set of features that can improve the accuracy in classification.In GA-SEFS contained six important parameters. Parameters of population size, number of generations, and the offspring do not directly affect the resulting accuracy of the classification ensemble. Ensemble size parameter can help to increase the accuracy of vote due to a variety of feature subset that can help improve accuracy. Alpha parameter can help to provide improved accuracy obtained by the combination of the above 4 parameters (ensemble size, population, number of generations, and the number of offspring). Beta parameter in this Final trial for three different datasets were further provide high accuracy values on the value of a negative beta.Keyword: subset feauret selection, ensemble, genetic searc

    Addressing Optimisation Challenges for Datasets with Many Variables, Using Genetic Algorithms to Implement Feature Selection

    Get PDF
    This article provides an optimisation method using a Genetic Algorithm approach to apply feature selection techniques for large data sets to improve accuracy. This is achieved through improved classification, a reduced number of features, and furthermore it aids in interpreting the model. A clinical dataset, based on heart failure, is used to illustrate the nature of the problem and to show the effectiveness of the techniques developed. Clinical datasets are sometimes characterised as having many variables. For instance, blood biochemistry data has more than 60 variables that have led to complexities in developing predictions of outcomes using machine-learning and other algorithms. Hence, techniques to make them moretractable are required. Genetic Algorithms can provide an efficient and low numerically complex method for effectively selecting features. In this paper, a way to estimate the number of required variables is presented, and a genetic algorithm is used in a “wrapper” form to select features for a case study of heart failure data.Additionally, different initial populations and termination conditions are used to arrive at a set of optimal features, and these are then compared with the features obtained using traditional methodologies. The paper provides a framework for estimating the number of variables and generations required for a suitable solution

    Local Rule-Based Explanations of Black Box Decision Systems

    Get PDF
    The recent years have witnessed the rise of accurate but obscure decision systems which hide the logic of their internal decision processes to the users. The lack of explanations for the decisions of black box systems is a key ethical issue, and a limitation to the adoption of machine learning components in socially sensitive and safety-critical contexts. %Therefore, we need explanations that reveals the reasons why a predictor takes a certain decision. In this paper we focus on the problem of black box outcome explanation, i.e., explaining the reasons of the decision taken on a specific instance. We propose LORE, an agnostic method able to provide interpretable and faithful explanations. LORE first leans a local interpretable predictor on a synthetic neighborhood generated by a genetic algorithm. Then it derives from the logic of the local interpretable predictor a meaningful explanation consisting of: a decision rule, which explains the reasons of the decision; and a set of counterfactual rules, suggesting the changes in the instance's features that lead to a different outcome. Wide experiments show that LORE outperforms existing methods and baselines both in the quality of explanations and in the accuracy in mimicking the black box

    Ensemble Learning for Free with Evolutionary Algorithms ?

    Get PDF
    Evolutionary Learning proceeds by evolving a population of classifiers, from which it generally returns (with some notable exceptions) the single best-of-run classifier as final result. In the meanwhile, Ensemble Learning, one of the most efficient approaches in supervised Machine Learning for the last decade, proceeds by building a population of diverse classifiers. Ensemble Learning with Evolutionary Computation thus receives increasing attention. The Evolutionary Ensemble Learning (EEL) approach presented in this paper features two contributions. First, a new fitness function, inspired by co-evolution and enforcing the classifier diversity, is presented. Further, a new selection criterion based on the classification margin is proposed. This criterion is used to extract the classifier ensemble from the final population only (Off-line) or incrementally along evolution (On-line). Experiments on a set of benchmark problems show that Off-line outperforms single-hypothesis evolutionary learning and state-of-art Boosting and generates smaller classifier ensembles
    corecore