Search CORE

3,166 research outputs found

A Genetic Programming Framework for Two Data Mining Tasks: Classification and Generalized Rule Induction

Author: Freitas Alex A.
Publication venue: Morgan Kaufmann
Publication date: 01/01/1997
Field of study

This paper proposes a genetic programming (GP) framework for two major data mining tasks, namely classification and generalized rule induction. The framework emphasizes the integration between a GP algorithm and relational database systems. In particular, the fitness of individuals is computed by submitting SQL queries to a (parallel) database server. Some advantages of this integration from a data mining viewpoint are scalability, data-privacy control and automatic parallelization

CiteSeerX

Kent Academic Repository

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia

A hybrid genetic algorithm for solving a layout problem in the fashion industry.

Author: Martens J
Put Ferdinand
Van Brecht J
Viaene Stijn
Publication venue
Publication date
Field of study

As of this writing, many success stories exist yet of powerful genetic algorithms (GAs) in the field of constraint optimisation. In this paper, a hybrid, intelligent genetic algorithm will be developed for solving a cutting layout problem in the Belgian fashion industry. In an initial section, an existing LP formulation of the cutting problem is briefly summarised and is used in further paragraphs as the core design of our GA. Through an initial attempt of rendering the algorithm as universal as possible, it was conceived a threefold genetic enhancement had to be carried out that reduces the size of the active solution space. The GA is therefore rebuilt using intelligent genetic operators, carrying out a local optimisation and applying a heuristic feasibility operator. Powerful computational results are achieved for a variety of problem cases that outperform any existing LP model yet developed.Fashion; Industry;

Research Papers in Economics

PonyGE2: Grammatical Evolution in Python

Author: Fagan David
Fenton Michael
Forstenlechner Stefan
Hemberg Erik
McDermott James
O'Neill Michael
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2017
Field of study

Grammatical Evolution (GE) is a population-based evolutionary algorithm, where a formal grammar is used in the genotype to phenotype mapping process. PonyGE2 is an open source implementation of GE in Python, developed at UCD's Natural Computing Research and Applications group. It is intended as an advertisement and a starting-point for those new to GE, a reference for students and researchers, a rapid-prototyping medium for our own experiments, and a Python workout. As well as providing the characteristic genotype to phenotype mapping of GE, a search algorithm engine is also provided. A number of sample problems and tutorials on how to use and adapt PonyGE2 have been developed.Comment: 8 pages, 4 figures, submitted to the 2017 GECCO Workshop on Evolutionary Computation Software Systems (EvoSoft

arXiv.org e-Print Archive

Crossref

Assessment and improvement of automated program repair mechanisms and components

Author: Assiri Fatmah Yousef
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2015
Field of study

2015 Spring.Includes bibliographical references.Automated program repair (APR) refers to techniques that locate and fix software faults automatically. An APR technique locates potentially faulty locations, then it searches the space of possible changes to select a program modification operator (PMO). The selected PMO is applied to a potentially faulty location thereby creating a new version of the faulty program, called a variant. The variant is validated by executing it against a set of test cases, called repair tests, which is used to identify a repair. When all of the repair tests are successful, the variant is considered a potential repair. Potential repairs that have passed a set of regression tests in addition to those included in the repair tests are deemed to be validated repairs. Different mechanisms and components can be applied to repair faults. APR mechanisms and components have a major impact on APR effectiveness, repair quality, and performance. APR effectiveness is the ability to and potential repairs. Repair quality is defined in terms of repair correctness and maintainability, where repair correctness indicates how well a potential repaired program retains required functionality, and repair maintainability indicates how easy it is to understand and maintain the generated potential repair. APR performance is the time and steps required to find a potential repair. Existing APR techniques can successfully fix faults, but the changes inserted to fix faults can have negative consequences on the quality of potential repairs. When a potential repair is executed against tests that were not included in the repair tests, the "repair" can fail. Such failures indicate that the generated repair is not a validated repair due to the introduction of other faults or the generated potential repair does not actually fix the real fault. In addition, some existing techniques add extraneous changes to the code that obfuscate the program logic and thus reduce its maintainability. APR effectiveness and performance can be dramatically degraded when an APR technique applies many PMOs, uses a large number of repair tests, locates many statements as potentially faulty locations, or applies a random search algorithm. This dissertation develops improved APR techniques and tool set to help optimize APR effectiveness, the quality of generated potential repairs, and APR performance based on a comprehensive evaluation of APR mechanisms and components. The evaluation involves the following: (1) the PMOs used to produce repairs, (2) the properties of repair tests used in the APR, (3) the fault localization techniques employed to identify potentially faulty statements, and (4) the search algorithms involved in the repair process. We also propose a set of guided search algorithms that guide the APR technique to select PMO that fix faults, which thereby improve APR effectiveness, repair quality, and performance. We performed a set of evaluations to investigate potential improvements in APR effectiveness, repair quality, and performance. APR effectiveness of different program modification operators is measured by the percent of fixed faults and the success rate. Success rate is the percentage of trials that result in potential repairs. One trial is equivalent to one execution of the search algorithm. APR effectiveness of different fault localization techniques is measured by the ability of a technique to identify actual faulty statements, and APR effectiveness of various repair test suites and search algorithms is also measured by the success rate. Repair correctness is measured by the percent of failed potential repairs for 100 trials for a faulty program, and the average percent of failed regression tests for N potential repairs for a faulty program; N is the number of potential repairs generated for 100 trials. Repair maintainability is measured by the average size of a potential repair, and the distribution of modifications throughout a potential repaired program. APR performance is measured by the average number of generated variants and the average total time required to find potential repairs. We built an evaluation framework creating a configurable mutation-based APR (MUT-APR) tool. MUT-APR allows us to vary the APR mechanisms and components. Our key findings are the following: (1) simple PMOs successfully fix faulty expression operators and improve the quality of potential repairs compared to other APR techniques that use existing code to repair faults, (2) branch coverage repair test suites improve APR effectiveness and repair quality significantly compared to repair test suites that satisfy statement coverage or random testing; however, they lowered APR performance, (3) small branch coverage repair test suites improved APR effectiveness, repair quality, and performance significantly compared to large branch coverage repair tests, (4) the Ochiai fault localization technique always identifies seeded faulty statements with an acceptable performance, and (5) guided random search algorithm improves APR effectiveness, repair quality, and performance compared to all other search algorithms; however, the exhaustive search algorithms is guaranteed a potential repair that failed fewer regression tests with a significant performance degradation as the program size increases. These improvements are incorporated into the MUT-APR tool for use in program repairs

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Evolutionary improvement of programs

Author: Arcuri A.
Clark J.A.
White D.R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2011
Field of study

Most applications of genetic programming (GP) involve the creation of an entirely new function, program or expression to solve a specific problem. In this paper, we propose a new approach that applies GP to improve existing software by optimizing its non-functional properties such as execution time, memory usage, or power consumption. In general, satisfying non-functional requirements is a difficult task and often achieved in part by optimizing compilers. However, modern compilers are in general not always able to produce semantically equivalent alternatives that optimize non-functional properties, even if such alternatives are known to exist: this is usually due to the limited local nature of such optimizations. In this paper, we discuss how best to combine and extend the existing evolutionary methods of GP, multiobjective optimization, and coevolution in order to improve existing software. Given as input the implementation of a function, we attempt to evolve a semantically equivalent version, in this case optimized to reduce execution time subject to a given probability distribution of inputs. We demonstrate that our framework is able to produce non-obvious optimizations that compilers are not yet able to generate on eight example functions. We employ a coevolved population of test cases to encourage the preservation of the function's semantics. We exploit the original program both through seeding of the population in order to focus the search, and as an oracle for testing purposes. As well as discussing the issues that arise when attempting to improve software, we employ rigorous experimental method to provide interesting and practical insights to suggest how to address these issues

Enlighten

Algorithms for the minimum sum coloring problem: a review

Author: Hamiez Jean-Philippe
Hao Jin-Kao
Jin Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The Minimum Sum Coloring Problem (MSCP) is a variant of the well-known vertex coloring problem which has a number of AI related applications. Due to its theoretical and practical relevance, MSCP attracts increasing attention. The only existing review on the problem dates back to 2004 and mainly covers the history of MSCP and theoretical developments on specific graphs. In recent years, the field has witnessed significant progresses on approximation algorithms and practical solution algorithms. The purpose of this review is to provide a comprehensive inspection of the most recent and representative MSCP algorithms. To be informative, we identify the general framework followed by practical solution algorithms and the key ingredients that make them successful. By classifying the main search strategies and putting forward the critical elements of the reviewed methods, we wish to encourage future development of more powerful methods and motivate new applications

arXiv.org e-Print Archive

Okina

A Survey on Software Testing Techniques using Genetic Algorithm

Author: Sabharwal Sangeeta
Sharma Chayanika
Sibal Ritu
Publication venue
Publication date: 05/11/2014
Field of study

The overall aim of the software industry is to ensure delivery of high quality software to the end user. To ensure high quality software, it is required to test software. Testing ensures that software meets user specifications and requirements. However, the field of software testing has a number of underlying issues like effective generation of test cases, prioritisation of test cases etc which need to be tackled. These issues demand on effort, time and cost of the testing. Different techniques and methodologies have been proposed for taking care of these issues. Use of evolutionary algorithms for automatic test generation has been an area of interest for many researchers. Genetic Algorithm (GA) is one such form of evolutionary algorithms. In this research paper, we present a survey of GA approach for addressing the various issues encountered during software testing.Comment: 13 Page

arXiv.org e-Print Archive

CiteSeerX