2,063 research outputs found

    A Framework for Genetic Algorithms Based on Hadoop

    Full text link
    Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in many real-world applications. The sequential execution of GAs requires considerable computational power both in time and resources. Nevertheless, GAs are naturally parallel and accessing a parallel platform such as Cloud is easy and cheap. Apache Hadoop is one of the common services that can be used for parallel applications. However, using Hadoop to develop a parallel version of GAs is not simple without facing its inner workings. Even though some sequential frameworks for GAs already exist, there is no framework supporting the development of GA applications that can be executed in parallel. In this paper is described a framework for parallel GAs on the Hadoop platform, following the paradigm of MapReduce. The main purpose of this framework is to allow the user to focus on the aspects of GA that are specific to the problem to be addressed, being sure that this task is going to be correctly executed on the Cloud with a good performance. The framework has been also exploited to develop an application for Feature Subset Selection problem. A preliminary analysis of the performance of the developed GA application has been performed using three datasets and shown very promising performance

    A survey on computational intelligence approaches for predictive modeling in prostate cancer

    Get PDF
    Predictive modeling in medicine involves the development of computational models which are capable of analysing large amounts of data in order to predict healthcare outcomes for individual patients. Computational intelligence approaches are suitable when the data to be modelled are too complex forconventional statistical techniques to process quickly and eciently. These advanced approaches are based on mathematical models that have been especially developed for dealing with the uncertainty and imprecision which is typically found in clinical and biological datasets. This paper provides a survey of recent work on computational intelligence approaches that have been applied to prostate cancer predictive modeling, and considers the challenges which need to be addressed. In particular, the paper considers a broad definition of computational intelligence which includes evolutionary algorithms (also known asmetaheuristic optimisation, nature inspired optimisation algorithms), Artificial Neural Networks, Deep Learning, Fuzzy based approaches, and hybrids of these,as well as Bayesian based approaches, and Markov models. Metaheuristic optimisation approaches, such as the Ant Colony Optimisation, Particle Swarm Optimisation, and Artificial Immune Network have been utilised for optimising the performance of prostate cancer predictive models, and the suitability of these approaches are discussed

    Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithm for Chest X-Ray Images Classification: COVID-19 as a Case Study

    Get PDF
    Covid-19 virus has led to a tremendous pandemic in more than 200 countries across the globe, leading to severe impacts on the lives and health of a large number of people globally. The emergence of Omicron (SARS-CoV-2), which is a coronavirus 2 variant, an acute respiratory syndrome which is highly mutated, has again caused social limitations around the world because of infectious and vaccine escape mutations. One of the most significant steps in the fight against covid-19 is to identify those who were infected with the virus as early as possible, to start their treatment and to minimize the risk of transmission. Detection of this disease from radiographic and radiological images is perhaps one of the quickest and most accessible methods of diagnosing patients. In this study, a computer aided system based on deep learning is proposed for rapid diagnosis of COVID-19 from chest x-ray images. First, a dataset of 5380 Chest x-ray images was collected from publicly available datasets. In the first step, the deep features of the images in the dataset are extracted by using the dataset pre-trained convolutional neural network (CNN) model. In the second step, Differential Evolution (DE), Ant Colony Optimization (ACO) and Particle Swarm Optimization (PSO) algorithms were used for feature selection in order to find the features that are effective for classification of these deep features. Finally, the features obtained in two stages, Decision Tree (DT), Naive Bayes (NB), support vector machine (SVM), k-Nearest Neighbours (k-NN) and Neural Network (NN) classifiers are used for binary, triple and quadruple classification. In order to measure the success of the models objectively, 10 folds cross validation was used. As a result, 1000 features were extracted with the SqueezeNet CNN model. In the binary, triple and quadruple classification process using these features, the SVM method was found to be the best classifier. The classification successes of the SVM model are 96.02%, 86.84% and 79.87%, respectively. The results obtained from the classification process with deep feature extraction were achieved by selecting the features in the proposed method in less time and with less features. While the performance achieved is very good, further analysis is required on a larger set of COVID-19 images to obtain higher estimates of accuracy

    Ant colony optimization approach for stacking configurations

    Full text link
    In data mining, classifiers are generated to predict the class labels of the instances. An ensemble is a decision making system which applies certain strategies to combine the predictions of different classifiers and generate a collective decision. Previous research has empirically and theoretically demonstrated that an ensemble classifier can be more accurate and stable than its component classifiers in most cases. Stacking is a well-known ensemble which adopts a two-level structure: the base-level classifiers to generate predictions and the meta-level classifier to make collective decisions. A consequential problem is: what learning algorithms should be used to generate the base-level and meta-level classifier in the Stacking configuration? It is not easy to find a suitable configuration for a specific dataset. In some early works, the selection of a meta classifier and its training data are the major concern. Recently, researchers have tried to apply metaheuristic methods to optimize the configuration of the base classifiers and the meta classifier. Ant Colony Optimization (ACO), which is inspired by the foraging behaviors of real ant colonies, is one of the most popular approaches among the metaheuristics. In this work, we propose a novel ACO-Stacking approach that uses ACO to tackle the Stacking configuration problem. This work is the first to apply ACO to the Stacking configuration problem. Different implementations of the ACO-Stacking approach are developed. The first version identifies the appropriate learning algorithms in generating the base-level classifiers while using a specific algorithm to create the meta-level classifier. The second version simultaneously finds the suitable learning algorithms to create the base-level classifiers and the meta-level classifier. Moreover, we study how different kinds on local information of classifiers will affect the classification results. Several pieces of local information collected from the initial phase of ACO-Stacking are considered, such as the precision, f-measure of each classifier and correlative differences of paired classifiers. A series of experiments are performed to compare the ACO-Stacking approach with other ensembles on a number of datasets of different domains and sizes. The experiments show that the new approach can achieve promising results and gain advantages over other ensembles. The correlative differences of the classifiers could be the best local information in this approach. Under the agile ACO-Stacking framework, an application to deal with a direct marketing problem is explored. A real world database from a US-based catalog company, containing more than 100,000 customer marketing records, is used in the experiments. The results indicate that our approach can gain more cumulative response lifts and cumulative profit lifts in the top deciles. In conclusion, it is competitive with some well-known conventional and ensemble data mining methods
    • …
    corecore