423 research outputs found
A Multi-Gene Genetic Programming Application for Predicting Students Failure at School
Several efforts to predict student failure rate (SFR) at school accurately
still remains a core problem area faced by many in the educational sector. The
procedure for forecasting SFR are rigid and most often times require data
scaling or conversion into binary form such as is the case of the logistic
model which may lead to lose of information and effect size attenuation. Also,
the high number of factors, incomplete and unbalanced dataset, and black boxing
issues as in Artificial Neural Networks and Fuzzy logic systems exposes the
need for more efficient tools. Currently the application of Genetic Programming
(GP) holds great promises and has produced tremendous positive results in
different sectors. In this regard, this study developed GPSFARPS, a software
application to provide a robust solution to the prediction of SFR using an
evolutionary algorithm known as multi-gene genetic programming. The approach is
validated by feeding a testing data set to the evolved GP models. Result
obtained from GPSFARPS simulations show its unique ability to evolve a suitable
failure rate expression with a fast convergence at 30 generations from a
maximum specified generation of 500. The multi-gene system was also able to
minimize the evolved model expression and accurately predict student failure
rate using a subset of the original expressionComment: 14 pages, 9 figures, Journal paper. arXiv admin note: text overlap
with arXiv:1403.0623 by other author
The use of data-mining for the automatic formation of tactics
This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques
A survey of genetic algorithms for multi-label classification
In recent years, multi-label classification (MLC) has become an emerging research topic in big data analytics and machine learning. In this problem, each object of a dataset may belong to multiple class labels and the goal is to learn a classification model that can infer the correct labels of new, previously unseen, objects. This paper presents a survey of genetic algorithms (GAs) designed for MLC tasks. The study is organized in three parts. First, we propose a new taxonomy focused on GAs for MLC. In the second part, we provide an up-to-date overview of the work in this area, categorizing the approaches identified in the literature with respect to the taxonomy. In the third and last part, we discuss some new ideas for combining GAs with MLC
Optimization algorithms for decision tree induction
Aufgrund der guten Interpretierbarkeit gehören Entscheidungsbäume zu den am häufigsten verwendeten Modellen des maschinellen Lernens zur Lösung von Klassifizierungs- und Regressionsaufgaben. Ihre Vorhersagen sind oft jedoch nicht so genau wie die anderer Modelle.
Der am weitesten verbreitete Ansatz zum Lernen von Entscheidungsbäumen ist die
Top-Down-Methode, bei der rekursiv neue Aufteilungen anhand eines einzelnen Merkmals eingefuhrt werden, die ein bestimmtes Aufteilungskriterium minimieren. Eine Möglichkeit diese Strategie zu verbessern und kleinere und genauere Entscheidungsbäume
zu erzeugen, besteht darin, andere Arten von Aufteilungen zuzulassen, z.B. welche, die
mehrere Merkmale gleichzeitig berücksichtigen. Solche zu bestimmen ist allerdings deutlich komplexer und es sind effektive Optimierungsalgorithmen notwendig um optimale
Lösungen zu finden.
Für numerische Merkmale sind Aufteilungen anhand affiner Hyperebenen eine Alternative zu univariaten Aufteilungen. Leider ist das Problem der optimalen Bestimmung der Hyperebenparameter im Allgemeinen NP-schwer. Inspiriert durch die zugrunde liegende Problemstruktur werden in dieser Arbeit daher zwei Heuristiken zur
näherungsweisen Lösung dieses Problems entwickelt. Die erste ist eine Kreuzentropiemethode, die iterativ Stichproben von der von-Mises-Fisher-Verteilung zieht und deren
Parameter mithilfe der besten Elemente daraus verbessert. Die zweite ist ein Simulated-Annealing-Verfahren, das eine Pivotstrategie zur Erkundung des Lösungsraums nutzt.
Aufgrund der gleichzeitigen Verwendung aller numerischen Merkmale sind generelle
Hyperebenenaufteilungen jedoch schwer zu interpretieren. Als Alternative wird in dieser
Arbeit daher die Verwendung von bivariaten Hyperebenenaufteilungen vorgeschlagen,
die Linien in dem von zwei Merkmalen aufgespannten Unterraum entsprechen. Mit diesen ist es möglich, den Merkmalsraum deutlich effizienter zu unterteilen als mit univariaten Aufteilungen. Gleichzeitig sind sie aufgrund der Beschränkung auf zwei Merkmale
gut interpretierbar. Zur optimalen Bestimmung der bivariaten Hyperebenenaufteilungen
wird ein Branch-and-Bound-Verfahren vorgestellt.
Darüber hinaus wird ein Branch-and-Bound-Verfahren zur Bestimmung optimaler
Kreuzaufteilungen entwickelt. Diese können als Kombination von zwei standardmäßigen
univariaten Aufteilung betrachtet werden und sind in Situationen nützlich, in denen die
Datenpunkte nur schlecht durch einzelne lineare Aufteilungen separiert werden können.
Die entwickelten unteren Schranken für verunreinigungsbasierte Aufteilungskriterien motivieren ebenfalls ein einfaches, aber effektives Branch-and-Bound-Verfahren zur
Bestimmung optimaler Aufteilungen nominaler Merkmale. Aufgrund der Komplexität
des zugrunde liegenden Optimierungsproblems musste man bisher nominale Merkmale
mittels Kodierungsschemata in numerische umwandeln oder Heuristiken nutzen, um suboptimale nominale Aufteilungen zu bestimmen. Das vorgeschlagene Branch-and-Bound-Verfahren bietet eine nützliche Alternative für viele praktische Anwendungsfälle.
Schließlich wird ein genetischer Algorithmus zur Induktion von Entscheidungsbäumen
als Alternative zur Top-Down-Methode vorgestellt.Decision trees are among the most commonly used machine learning models for solving
classification and regression tasks due to their major advantage of being easy to interpret.
However, their predictions are often not as accurate as those of other models.
The most widely used approach for learning decision trees is to build them in a top-down manner by introducing splits on a single variable that minimize a certain splitting
criterion. One possibility of improving this strategy to induce smaller and more accurate
decision trees is to allow different types of splits which, for example, consider multiple
features simultaneously. However, finding such splits is usually much more complex and
effective optimization methods are needed to determine optimal solutions.
An alternative to univarate splits for numerical features are oblique splits which
employ affine hyperplanes to divide the feature space. Unfortunately, the problem of
determining such a split optimally is known to be NP-hard in general. Inspired by the
underlying problem structure, two new heuristics are developed for finding near-optimal
oblique splits. The first one is a cross-entropy optimization method which iteratively
samples points from the von Mises-Fisher distribution and updates its parameters based
on the best performing samples. The second one is a simulated annealing algorithm that
uses a pivoting strategy to explore the solution space.
As general oblique splits employ all of the numerical features simultaneously, they are
hard to interpret. As an alternative, in this thesis, the usage of bivariate oblique splits
is proposed. These splits correspond to lines in the subspace spanned by two features.
They are capable of dividing the feature space much more efficiently than univariate
splits while also being fairly interpretable due to the restriction to two features only.
A branch and bound method is presented to determine these bivariate oblique splits
optimally.
Furthermore, a branch and bound method to determine optimal cross-splits is presented. These splits can be viewed as combinations of two standard univariate splits
on numeric attributes and they are useful in situations where the data points cannot
be separated well linearly. The cross-splits can either be introduced directly to induce
quaternary decision trees or, which is usually better, they can be used to provide a
certain degree of foresight, in which case only the better of the two respective univariate
splits is introduced.
The developed lower bounds for impurity based splitting criteria also motivate a
simple but effective branch and bound algorithm for splits on nominal features. Due to
the complexity of determining such splits optimally when the number of possible values
for the feature is large, one previously had to use encoding schemes to transform the
nominal features into numerical ones or rely on heuristics to find near-optimal nominal
splits. The proposed branch and bound method may be a viable alternative for many
practical applications.
Lastly, a genetic algorithm is proposed as an alternative to the top-down induction
strategy
Automatic synthesis of fuzzy systems: An evolutionary overview with a genetic programming perspective
Studies in Evolutionary Fuzzy Systems (EFSs) began in the 90s and have experienced a fast development since then, with applications to areas such as pattern recognition, curve‐fitting and regression, forecasting and control. An EFS results from the combination of a Fuzzy Inference System (FIS) with an Evolutionary Algorithm (EA). This relationship can be established for multiple purposes: fine‐tuning of FIS's parameters, selection of fuzzy rules, learning a rule base or membership functions from scratch, and so forth. Each facet of this relationship creates a strand in the literature, as membership function fine‐tuning, fuzzy rule‐based learning, and so forth and the purpose here is to outline some of what has been done in each aspect. Special focus is given to Genetic Programming‐based EFSs by providing a taxonomy of the main architectures available, as well as by pointing out the gaps that still prevail in the literature. The concluding remarks address some further topics of current research and trends, such as interpretability analysis, multiobjective optimization, and synthesis of a FIS through Evolving methods
Methodological review of multicriteria optimization techniques: aplications in water resources
Multi-criteria decision analysis (MCDA) is an umbrella approach that has been applied to a wide range of natural resource management situations. This report has two purposes. First, it aims to provide an overview of advancedmulticriteriaapproaches, methods and tools. The review seeks to layout the nature of the models, their inherent strengths and limitations. Analysis of their applicability in supporting real-life decision-making processes is provided with relation to requirements imposed by organizationally decentralized and economically specific spatial and temporal frameworks. Models are categorized based on different classification schemes and are reviewed by describing their general characteristics, approaches, and fundamental properties. A necessity of careful structuring of decision problems is discussed regarding planning, staging and control aspects within broader agricultural context, and in water management in particular. A special emphasis is given to the importance of manipulating decision elements by means ofhierarchingand clustering. The review goes beyond traditionalMCDAtechniques; it describes new modelling approaches. The second purpose is to describe newMCDAparadigms aimed at addressing the inherent complexity of managing water ecosystems, particularly with respect to multiple criteria integrated with biophysical models,multistakeholders, and lack of information. Comments about, and critical analysis of, the limitations of traditional models are made to point out the need for, and propose a call to, a new way of thinking aboutMCDAas they are applied to water and natural resources management planning. These new perspectives do not undermine the value of traditional methods; rather they point to a shift in emphasis from methods for problem solving to methods for problem structuring. Literature review show successfully integrations of watershed management optimization models to efficiently screen a broad range of technical, economic, and policy management options within a watershed system framework and select the optimal combination of management strategies and associated water allocations for designing a sustainable watershed management plan at least cost. Papers show applications in watershed management model that integrates both natural and human elements of a watershed system including the management of ground and surface water sources, water treatment and distribution systems, human demands,wastewatertreatment and collection systems, water reuse facilities,nonpotablewater distribution infrastructure, aquifer storage and recharge facilities, storm water, and land use
A Field Guide to Genetic Programming
xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction --
Representation, initialisation and operators in Tree-based GP --
Getting ready to run genetic programming --
Example genetic programming run --
Alternative initialisations and operators in Tree-based GP --
Modular, grammatical and developmental Tree-based GP --
Linear and graph genetic programming --
Probalistic genetic programming --
Multi-objective genetic programming --
Fast and distributed genetic programming --
GP theory and its applications --
Applications --
Troubleshooting GP --
Conclusions.Contents
xi
1 Introduction
1.1 Genetic Programming in a Nutshell
1.2 Getting Started
1.3 Prerequisites
1.4 Overview of this Field Guide I
Basics
2 Representation, Initialisation and GP
2.1 Representation
2.2 Initialising the Population
2.3 Selection
2.4 Recombination and Mutation Operators in Tree-based
3 Getting Ready to Run Genetic Programming 19
3.1 Step 1: Terminal Set 19
3.2 Step 2: Function Set 20
3.2.1 Closure 21
3.2.2 Sufficiency 23
3.2.3 Evolving Structures other than Programs 23
3.3 Step 3: Fitness Function 24
3.4 Step 4: GP Parameters 26
3.5 Step 5: Termination and solution designation 27
4 Example Genetic Programming Run
4.1 Preparatory Steps 29
4.2 Step-by-Step Sample Run 31
4.2.1 Initialisation 31
4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming
5 Alternative Initialisations and Operators in
5.1 Constructing the Initial Population
5.1.1 Uniform Initialisation
5.1.2 Initialisation may Affect Bloat
5.1.3 Seeding
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
5.2.2 Mutation Cookbook
5.3 GP Crossover
5.4 Other Techniques 32
5.5 Tree-based GP 39
6 Modular, Grammatical and Developmental Tree-based GP 47
6.1 Evolving Modular and Hierarchical Structures 47
6.1.1 Automatically Defined Functions 48
6.1.2 Program Architecture and Architecture-Altering 50
6.2 Constraining Structures 51
6.2.1 Enforcing Particular Structures 52
6.2.2 Strongly Typed GP 52
6.2.3 Grammar-based Constraints 53
6.2.4 Constraints and Bias 55
6.3 Developmental Genetic Programming 57
6.4 Strongly Typed Autoconstructive GP with PushGP 59
7 Linear and Graph Genetic Programming 61
7.1 Linear Genetic Programming 61
7.1.1 Motivations 61
7.1.2 Linear GP Representations 62
7.1.3 Linear GP Operators 64
7.2 Graph-Based Genetic Programming 65
7.2.1 Parallel Distributed GP (PDGP) 65
7.2.2 PADO 67
7.2.3 Cartesian GP 67
7.2.4 Evolving Parallel Programs using Indirect Encodings 68
8 Probabilistic Genetic Programming
8.1 Estimation of Distribution Algorithms 69
8.2 Pure EDA GP 71
8.3 Mixing Grammars and Probabilities 74
9 Multi-objective Genetic Programming 75
9.1 Combining Multiple Objectives into a Scalar Fitness Function 75
9.2 Keeping the Objectives Separate 76
9.2.1 Multi-objective Bloat and Complexity Control 77
9.2.2 Other Objectives 78
9.2.3 Non-Pareto Criteria 80
9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80
9.4 Multi-objective Optimisation via Operator Bias 81
10 Fast and Distributed Genetic Programming 83
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83
10.2 Reducing Cost of Fitness with Caches 86
10.3 Parallel and Distributed GP are Not Equivalent 88
10.4 Running GP on Parallel Hardware 89
10.4.1 Master–slave GP 89
10.4.2 GP Running on GPUs 90
10.4.3 GP on FPGAs 92
10.4.4 Sub-machine-code GP 93
10.5 Geographically Distributed GP 93
11 GP Theory and its Applications 97
11.1 Mathematical Models 98
11.2 Search Spaces 99
11.3 Bloat 101
11.3.1 Bloat in Theory 101
11.3.2 Bloat Control in Practice 104
III
Practical Genetic Programming
12 Applications
12.1 Where GP has Done Well
12.2 Curve Fitting, Data Modelling and Symbolic Regression
12.3 Human Competitive Results – the Humies
12.4 Image and Signal Processing
12.5 Financial Trading, Time Series, and Economic Modelling
12.6 Industrial Process Control
12.7 Medicine, Biology and Bioinformatics
12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii
12.9 Entertainment and Computer Games 127
12.10The Arts 127
12.11Compression 128
13 Troubleshooting GP
13.1 Is there a Bug in the Code?
13.2 Can you Trust your Results?
13.3 There are No Silver Bullets
13.4 Small Changes can have Big Effects
13.5 Big Changes can have No Effect
13.6 Study your Populations
13.7 Encourage Diversity
13.8 Embrace Approximation
13.9 Control Bloat
13.10 Checkpoint Results
13.11 Report Well
13.12 Convince your Customers
14 Conclusions
Tricks of the Trade
A Resources
A.1 Key Books
A.2 Key Journals
A.3 Key International Meetings
A.4 GP Implementations
A.5 On-Line Resources 145
B TinyGP 151
B.1 Overview of TinyGP 151
B.2 Input Data Files for TinyGP 153
B.3 Source Code 154
B.4 Compiling and Running TinyGP 162
Bibliography 167
Inde
Research Trends and Outlooks in Assembly Line Balancing Problems
This paper presents the findings from the survey of articles published on the assembly line balancing problems (ALBPs) during 2014-2018. Before proceeding a comprehensive literature review, the ineffectiveness of the previous ALBP classification structures is discussed and a new classification scheme based on the layout configurations of assembly lines is subsequently proposed. The research trend in each layout of assembly lines is highlighted through the graphical presentations. The challenges in the ALBPs are also pinpointed as a technical guideline for future research works
- …