278 research outputs found
An Overview of Schema Theory
The purpose of this paper is to give an introduction to the field of Schema
Theory written by a mathematician and for mathematicians. In particular, we
endeavor to to highlight areas of the field which might be of interest to a
mathematician, to point out some related open problems, and to suggest some
large-scale projects. Schema theory seeks to give a theoretical justification
for the efficacy of the field of genetic algorithms, so readers who have
studied genetic algorithms stand to gain the most from this paper. However,
nothing beyond basic probability theory is assumed of the reader, and for this
reason we write in a fairly informal style.
Because the mathematics behind the theorems in schema theory is relatively
elementary, we focus more on the motivation and philosophy. Many of these
results have been proven elsewhere, so this paper is designed to serve a
primarily expository role. We attempt to cast known results in a new light,
which makes the suggested future directions natural. This involves devoting a
substantial amount of time to the history of the field.
We hope that this exposition will entice some mathematicians to do research
in this area, that it will serve as a road map for researchers new to the
field, and that it will help explain how schema theory developed. Furthermore,
we hope that the results collected in this document will serve as a useful
reference. Finally, as far as the author knows, the questions raised in the
final section are new.Comment: 27 pages. Originally written in 2009 and hosted on my website, I've
decided to put it on the arXiv as a more permanent home. The paper is
primarily expository, so I don't really know where to submit it, but perhaps
one day I will find an appropriate journa
Real-coded genetic algorithm particle filters for high-dimensional state spaces
This thesis successfully addresses the issues faced by particle filters in high-dimensional state-spaces by comparing them with genetic algorithms and then using genetic algorithm theory to address these issues. Sequential Monte Carlo methods are a class of online posterior density estimation algorithms that are suitable for non-Gaussian and nonlinear environments, however they are known to suffer from particle degeneracy; where the sample of particles becomes too sparse to approximate the posterior accurately. Various techniques have been proposed to address this issue but these techniques fail in high-dimensions. In this thesis, after a careful comparison between genetic algorithms and particle filters, we posit that genetic algorithm theoretic arguments can be used to explain the working of particle filters. Analysing the working of a particle filter, we note that it is designed similar to a genetic algorithm but does not include recombination. We argue based on the building-block hypothesis that the addition of a recombination operator would be able to address the sample impoverishment phenomenon in higher dimensions. We propose a novel real-coded genetic algorithm particle filter (RGAPF) based on these observations and test our hypothesis on the stochastic volatility estimation of financial stocks. The RGAPF successfully scales to higher-dimensions. To further strengthen our argument that whether building-block-hypothesis-like effects are due to the recombination operator, we compare the RGAPF with a mutation-only particle filter with an adjustable mutation rate that is set to equal the population-to-population variance of the RGAPF. The latter significantly and consistently performs better, indicating that recombination is having a subtle and significant effect that may be theoretically explained by genetic algorithm theory. After two successful attempts at validating our hypothesis we compare the performance of the RGAPF using different real-recombination operators. Observing the behaviour of the RGAPF under these recombination operators we propose a mean-centric recombination operator specifically for high-dimensional particle filtering. This recombination operator is successfully tested and compared with benchmark particle filters and a hybrid CMA-ES particle filter using simulated data and finally on real end-of-day data of the securities making up the FTSE-100 index. Each experiment is discussed in detail and we conclude with a brief description of the future direction of research
Adaptive scaling of evolvable systems
Neo-Darwinian evolution is an established natural inspiration for computational optimisation with a diverse range of forms. A particular feature of models such as Genetic Algorithms (GA) [18, 12] is the incremental combination of partial solutions distributed within a population of solutions. This mechanism in principle allows certain problems to be solved which would not be amenable to a simple local search. Such problems require these partial solutions, generally known as building-blocks, to be handled without disruption. The traditional means for this is a combination of a suitable chromosome ordering with a sympathetic recombination operator. More advanced algorithms attempt to adapt to accommodate these dependencies during the search. The recent approach of Estimation of Distribution Algorithms (EDA) aims to directly infer a probabilistic model of a promising population distribution from a sample of fitter solutions [23]. This model is then sampled to generate a new solution set. A symbiotic view of evolution is behind the recent development of the Compositional Search Evolutionary Algorithms (CSEA) [49, 19, 8] which build up an incremental model of variable dependencies conditional on a series of tests. Building-blocks are retained as explicit genetic structures and conditionally joined to form higher-order structures. These have been shown to be effective on special classes of hierarchical problems but are unproven on less tightly-structured problems. We propose that there exists a simple yet powerful combination of the above approaches: the persistent, adapting dependency model of a compositional pool with the expressive and compact variable weighting of probabilistic models. We review and deconstruct some of the key methods above for the purpose of determining their individual drawbacks and their common principles. By this reasoned approach we aim to arrive at a unifying framework that can adaptively scale to span a range of problem structure classes. This is implemented in a novel algorithm called the Transitional Evolutionary Algorithm (TEA). This is empirically validated in an incremental manner, verifying the various facets of the TEA and comparing it with related algorithms for an increasingly structured series of benchmark problems. This prompts some refinements to result in a simple and general algorithm that is nevertheless competitive with state-of-the-art methods
The influence of population size in geometric semantic GP
In this work, we study the influence of the population size on the learning ability of Geometric Semantic Genetic Programming for the task of symbolic regression. A large set of experiments, considering different population size values on different regression problems, has been performed. Results show that, on real-life problems, having small populations results in a better training fitness with respect to the use of large populations after the same number of fitness evaluations. However, performance on the test instances varies among the different problems: in datasets with a high number of features, models obtained with large populations present a better performance on unseen data, while in datasets characterized by a relative small number of variables a better generalization ability is achieved by using small population size values. When synthetic problems are taken into account, large population size values represent the best option for achieving good quality solutions on both training and test instances
A Field Guide to Genetic Programming
xiv, 233 p. : il. ; 23 cm.Libro ElectrĂłnicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction --
Representation, initialisation and operators in Tree-based GP --
Getting ready to run genetic programming --
Example genetic programming run --
Alternative initialisations and operators in Tree-based GP --
Modular, grammatical and developmental Tree-based GP --
Linear and graph genetic programming --
Probalistic genetic programming --
Multi-objective genetic programming --
Fast and distributed genetic programming --
GP theory and its applications --
Applications --
Troubleshooting GP --
Conclusions.Contents
xi
1 Introduction
1.1 Genetic Programming in a Nutshell
1.2 Getting Started
1.3 Prerequisites
1.4 Overview of this Field Guide I
Basics
2 Representation, Initialisation and GP
2.1 Representation
2.2 Initialising the Population
2.3 Selection
2.4 Recombination and Mutation Operators in Tree-based
3 Getting Ready to Run Genetic Programming 19
3.1 Step 1: Terminal Set 19
3.2 Step 2: Function Set 20
3.2.1 Closure 21
3.2.2 Sufficiency 23
3.2.3 Evolving Structures other than Programs 23
3.3 Step 3: Fitness Function 24
3.4 Step 4: GP Parameters 26
3.5 Step 5: Termination and solution designation 27
4 Example Genetic Programming Run
4.1 Preparatory Steps 29
4.2 Step-by-Step Sample Run 31
4.2.1 Initialisation 31
4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming
5 Alternative Initialisations and Operators in
5.1 Constructing the Initial Population
5.1.1 Uniform Initialisation
5.1.2 Initialisation may Affect Bloat
5.1.3 Seeding
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
5.2.2 Mutation Cookbook
5.3 GP Crossover
5.4 Other Techniques 32
5.5 Tree-based GP 39
6 Modular, Grammatical and Developmental Tree-based GP 47
6.1 Evolving Modular and Hierarchical Structures 47
6.1.1 Automatically Defined Functions 48
6.1.2 Program Architecture and Architecture-Altering 50
6.2 Constraining Structures 51
6.2.1 Enforcing Particular Structures 52
6.2.2 Strongly Typed GP 52
6.2.3 Grammar-based Constraints 53
6.2.4 Constraints and Bias 55
6.3 Developmental Genetic Programming 57
6.4 Strongly Typed Autoconstructive GP with PushGP 59
7 Linear and Graph Genetic Programming 61
7.1 Linear Genetic Programming 61
7.1.1 Motivations 61
7.1.2 Linear GP Representations 62
7.1.3 Linear GP Operators 64
7.2 Graph-Based Genetic Programming 65
7.2.1 Parallel Distributed GP (PDGP) 65
7.2.2 PADO 67
7.2.3 Cartesian GP 67
7.2.4 Evolving Parallel Programs using Indirect Encodings 68
8 Probabilistic Genetic Programming
8.1 Estimation of Distribution Algorithms 69
8.2 Pure EDA GP 71
8.3 Mixing Grammars and Probabilities 74
9 Multi-objective Genetic Programming 75
9.1 Combining Multiple Objectives into a Scalar Fitness Function 75
9.2 Keeping the Objectives Separate 76
9.2.1 Multi-objective Bloat and Complexity Control 77
9.2.2 Other Objectives 78
9.2.3 Non-Pareto Criteria 80
9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80
9.4 Multi-objective Optimisation via Operator Bias 81
10 Fast and Distributed Genetic Programming 83
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83
10.2 Reducing Cost of Fitness with Caches 86
10.3 Parallel and Distributed GP are Not Equivalent 88
10.4 Running GP on Parallel Hardware 89
10.4.1 Masterâslave GP 89
10.4.2 GP Running on GPUs 90
10.4.3 GP on FPGAs 92
10.4.4 Sub-machine-code GP 93
10.5 Geographically Distributed GP 93
11 GP Theory and its Applications 97
11.1 Mathematical Models 98
11.2 Search Spaces 99
11.3 Bloat 101
11.3.1 Bloat in Theory 101
11.3.2 Bloat Control in Practice 104
III
Practical Genetic Programming
12 Applications
12.1 Where GP has Done Well
12.2 Curve Fitting, Data Modelling and Symbolic Regression
12.3 Human Competitive Results â the Humies
12.4 Image and Signal Processing
12.5 Financial Trading, Time Series, and Economic Modelling
12.6 Industrial Process Control
12.7 Medicine, Biology and Bioinformatics
12.8 GP to Create Searchers and Solvers â Hyper-heuristics xiii
12.9 Entertainment and Computer Games 127
12.10The Arts 127
12.11Compression 128
13 Troubleshooting GP
13.1 Is there a Bug in the Code?
13.2 Can you Trust your Results?
13.3 There are No Silver Bullets
13.4 Small Changes can have Big Effects
13.5 Big Changes can have No Effect
13.6 Study your Populations
13.7 Encourage Diversity
13.8 Embrace Approximation
13.9 Control Bloat
13.10 Checkpoint Results
13.11 Report Well
13.12 Convince your Customers
14 Conclusions
Tricks of the Trade
A Resources
A.1 Key Books
A.2 Key Journals
A.3 Key International Meetings
A.4 GP Implementations
A.5 On-Line Resources 145
B TinyGP 151
B.1 Overview of TinyGP 151
B.2 Input Data Files for TinyGP 153
B.3 Source Code 154
B.4 Compiling and Running TinyGP 162
Bibliography 167
Inde
Evolutionary Computation
This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field
Evolutionary computation for trading systems
2007/2008Evolutionary computations, also called evolutionary algorithms, consist of
several heuristics, which are able to solve optimization tasks by imitating
some aspects of natural evolution. They may use different levels of abstraction, but they are always working on populations of possible solutions for a
given task. The basic idea is that if only those individuals of a population
which meet a certain selection criteria reproduce, while the remaining individuals die, the population will converge to those individuals that best meet
the selection criteria. If imperfect reproduction is added the population can
begin to explore the search space and will move to individuals that have an
increased selection probability and that hand down this property to their
descendants. These population dynamics follow the basic rule of the Darwinian evolution theory, which can be described in short as the âsurvival of the fittestâ.
Although evolutionary computations belong to a relative new research area,
from a computational perspective they have already showed some promising
features such as:
⢠evolutionary methods reveal a remarkable balance between efficiency
and efficacy;
⢠evolutionary computations are well suited for parameter optimisation;
⢠this type of algorithms allows a wide variety of extensions and constraints that cannot be provided in traditional methods;
⢠evolutionary methods are easily combined with other optimization
techniques and can also be extended to multi-objective optimization.
From an economic perspective, these methods appear to be particularly well
suited for a wide range of possible financial applications, in particular in this
thesis I study evolutionary algorithms
⢠for time series prediction;
⢠to generate trading rules;
⢠for portfolio selection.
It is commonly believed that asset prices are not random, but are permeated by complex interrelations that often translate in assets mispricing and
may give rise to potentially profitable opportunities. Classical financial approaches, such as dividend discount models or even capital asset pricing theories, are not able to capture these market complexities. Thus, in the
last decades, researchers have employed intensive econometric and statistical
modeling that examine the effects of a multitude of variables, such as price-
earnings ratios, dividend yields, interest rate spreads and changes in foreign
exchange rates, on a broad and variegated range of stocks at the same time.
However, these models often result in complex functional forms difficult to
manage or interpret and, in the worst case, are solely able to fit a given time
series but are useless to predict it. Parallelly to quantitative approaches,
other researchers have focused on the impact of investor psychology (in particular, herding and overreaction) and on the consequences of considering
informed signals from management and analysts, such as share repurchases
and analyst recommendations. These theories are guided by intuition and
experience, and thus are difficult to be translated into a mathematical environment.
Hence, the necessity to combine together these point of views in order to
develop models that examine simultaneously hundreds of variables, including qualitative informations, and that have user friendly representations, is
urged. To this end, the thesis focuses on the study of methodologies that
satisfy these requirements by integrating economic insights, derived from
academic and professional knowledge, and evolutionary computations.
The main task of this work is to provide efficient algorithms based on the
evolutionary paradigm of biological systems in order to compute optimal
trading strategies for various profit objectives under economic and statistical constraints. The motivations for constructing such optimal strategies
are:
i) the necessity to overcome data-snooping and supervisorship bias in
order to learn to predict good trading opportunities by using market
and/or technical indicators as features on which to base the forecasting;
ii) the feasibility of using these rules as benchmark for real trading
systems;
iii) the capability of ranking quantitatively various markets with respect
to their profitability according to a given criterion, thus making possible portfolio allocations.
More precisely, I present two algorithms that use artificial expert trading
systems to predict financial time series, and a procedure to generate integrated neutral strategies for active portfolio management.
The first algorithm is an automated procedure that simultaneously selects
variables and detect outliers in a dynamic linear model using information
criteria as objective functions and diagnostic tests as constraints for the
distributional properties of errors. The novelties are the automatic implementation of econometric conditions in the model selection step, making
possible a better exploration of the solution space on one hand, and the use
of evolutionary computations to efficiently generate a reduction procedure from a very large number of independent variables on the other hand.
In the second algorithm, the novelty is given by the definition of evolutionary
learning in financial terms and its use in a multi-objective genetic algorithm
in order to generate technical trading systems.
The last tool is based on a trading strategy on six assets, where future
movements of each variable are obtained by an evolutionary procedure that
integrates various types of financial variables. The contribution is given
by the introduction of a genetic algorithm to optimize trading signals parameters and the way in which different informations are represented and
collected.
In order to compare the contribution of this work to âclassicalâ techniques
and theories, the thesis is divided into three parts. The first part, titled
Background, collects Chapters 2 and 3. Its purpose is to provide an introduction to search/optimization evolutionary techniques on one hand, and to
the theories that relate the predictability in financial markets with the concept of efficiency proposed over time by scholars on the other hand. More
precisely, Chapter 2 introduces the basic concepts and major areas of evolutionary computation. It presents a brief history of three major types of evolutionary algorithms, i.e. evolution strategies, evolutionary programming
and genetic algorithms, and points out similarities and differences among
them. Moreover it gives an overview of genetic algorithms and describes
classical and genetic multi-objective optimization techniques. Chapter 3
first presents an overview of the literature on the predictability of financial
time series. In particular, the extent to which the efficiency paradigm is
affected by the introduction of new theories, such as behavioral finance, is
described in order to justify the market forecasting methodologies developed
by practitioners and academics in the last decades. Then, a description of
the econometric and financial techniques that will be used in conjunction
with evolutionary algorithms in the successive chapters is provided. Special
attention is paid to economic implications, in order to highlight merits and
shortcomings from a practitioner perspective.
The second part of the thesis, titled Trading Systems, is devoted to the description of two procedures I have developed in order to generate artificial
trading strategies on the basis of evolutionary algorithms, and it groups
Chapters 4 and 5. In particular, chapter 4 presents a genetic algorithm for
variable selection by minimizing the error in a multiple regression model.
Measures of errors such as ME, RMSE, MAE, Theilâs inequality coefficient
and CDC are analyzed choosing models based on AIC, BIC, ICOMP and
similar criteria. Two components of penalty functions are taken in analysis-
level of significance and Durbin Watson statistics. Asymptotic properties of
functions are tested on several financial variables including stocks, bonds,
returns, composite prices indices from the US and the EU economies. Variables with outliers that distort the efficiency and consistency of estimators
are removed to solve masking and smearing problems that they may cause in
estimations. Two examples complete the chapter. In both cases, models are
designed to produce short-term forecasts for the excess returns of the MSCI
Europe Energy sector on the MSCI Europe index and a recursive estimation-
window is used to shed light on their predictability performances. In the first
application the data-set is obtained by a reduction procedure from a very
large number of leading macro indicators and financial variables stacked
at various lags, while in the second the complete set of 1-month lagged
variables is considered. Results show a promising capability to predict excess sector returns through the selection, using the proposed methodology,
of most valuable predictors. In Chapter 5 the paradigm of evolutionary
learning is defined and applied in the context of technical trading rules for
stock timing. A new genetic algorithm is developed by integrating statistical
learning methods and bootstrap to a multi-objective non dominated sorting
algorithm with variable string length, making possible to evaluate statistical
and economic criteria at the same time. Subsequently, the chapter discusses
a practical case, represented by a simple trading strategy where total funds
are invested in either the S&P 500 Composite Index or in 3-month Treasury
Bills. In this application, the most informative technical indicators are selected from a set of almost 5000 signals by the algorithm. Successively, these
signals are combined into a unique trading signal by a learning method. I
test the expert weighting solution obtained by the plurality voting committee, the Bayesian model averaging and Boosting procedures with data from
the the S&P 500 Composite Index, in three market phases, up-trend, down-
trend and sideways-movements, covering the period 2000â2006.
In the third part, titled Portfolio Selection, I explain how portfolio optimization models may be constructed on the basis of evolutionary algorithms and
on the signals produced by artificial trading systems. First, market neutral
strategies from an economic point of view are introduced, highlighting their
risks and benefits and focusing on their quantitative formulation. Then, a
description of the GA-Integrated Neutral tool, a MATLAB set of functions
based on genetic algorithms for active portfolio management, is given. The
algorithm specializes in the parameter optimization of trading signals for
an integrated market neutral strategy. The chapter concludes showing an
application of the tool as a support to decisions in the Absolute Return
Interest Rate Strategies sub-fund of Generali Investments.Gli âalgoritmi evolutiviâ, noti anche come âevolutionary computationsâ
comprendono varie tecniche di ottimizzazione per la risoluzione di problemi,
mediante alcuni aspetti suggeriti dallâevoluzione naturale. Tali metodologie
sono accomunate dal fatto che non considerano unâunica soluzione alla
volta, bens`Äą trattano intere popolazioni di possibili soluzioni per un dato
problema. Lâidea sottostante `e che, se un algoritmo fa evolvere solamente
gli individui di una data popolazione che soddisfano a un certo criterio di
selezione, e lascia morire i restanti, la popolazione converger`a agli individui
che meglio soddisfano il criterio di selezione. Con una selezione non ottimale,
cio`e una che ammette pure soluzioni sub-ottimali, la popolazione rappresenter`
a meglio lâintero spazio di ricerca e sar`a in grado di individuare in modo
pi`u consistente gli individui migliori da far evolvere. Queste dinamiche interne
alle popolazioni seguono i principi Darwiniani dellâevoluzione, che si
possono sinteticamente riassumere nella dicitura âla sopravvivenza del piĂš
adattoâ.
Sebbene gli algoritmi evolutivi siano unâarea di ricerca relativamente nuova,
dal punto di vista computazionale hanno dimostrato alcune caratteristiche
interessanti fra cui le seguenti:
⢠permettono un notevole equilibrio tra efficienza ed efficacia;
⢠sono particolarmente indicati per la configurazione dei parametri in
problemi di ottimizzazione;
⢠consentono una flessibilit`a nella definizione matematica dei problemi
e dei vincoli che non si trova nei metodi tradizionali;
⢠possono facilmente essere integrati con altre tecniche di ottimizzazione
ed essere essere modificati per risolvere problemi multi-obiettivo.
Dal un punto di vista economico, lâapplicazione di queste procedure pu`o
risultare utile specialmente in campo finanziario. In particolare, nella mia
tesi ho studiato degli algoritmi evolutivi per
⢠la previsione di serie storiche finanziarie;
⢠la costruzione di regole di trading;
⢠la selezione di portafogli.
Da un punto di vista pi`u ampio, lo scopo di questa ricerca `e dunque lâanalisi
dellâevoluzione e della complessit`a dei mercati finanziari. In tal senso, dal
momento che i prezzi non seguono andamenti puramente casuali, ma sono
governati da un insieme molto articolato di eventi correlati, i modelli e le
teorie classiche, come i dividend discount model e le varie capital asset pricing
theories, non sono pi`u sufficienti per determinare potenziali opportunit`a di
profitto. A tal fine, negli ultimi decenni, alcuni ricercatori hanno sviluppato
una vasta gamma di modelli econometrici e statistici in grado di esaminare
contemporaneamente le relazioni e gli effetti di centinaia di variabili, come
ad esempio, price-earnings ratios, dividendi, differenziali fra tassi di interesse
e variazioni dei tassi di cambio, per una vasta gamma di assets. Comunque,
questo approccio, che fa largo impiego di strumenti di calcolo, spesso porta
a dei modelli troppo complicati per essere gestiti o interpretati, e, nel peggiore
dei casi, pur essendo ottimi per descrivere situazioni passate, risultano
inutili per fare previsioni. Parallelamente a questi approcci quantitativi, si
`e manifestato un grande interesse sulla psicologia degli investitori e sulle
conseguenze derivanti dalle opinioni di esperti e analisti nelle dinamiche del
mercato. Questi studi sono difficilmente traducibili in modelli matematici
e si basano principalmente sullâintuizione e sullâesperienza. Da qui la necessit`
a di combinare insieme questi due punti di vista, al fine di sviluppare
modelli che siano in grado da una parte di trattare contemporaneamente
un elevato numero di variabili in modo efficiente e, dallâaltra, di incorporare
informazioni e opinioni qualitative. La tesi affronta queste tematiche integrando
le conoscenze economiche, sia accademiche che professionali, con gli
algoritmi evolutivi. Pi`u pecisamente, il principale obiettivo di questo lavoro
`e lo sviluppo di algoritmi efficienti basati sul paradigma dellâevoluzione dei
sistemi biologici al fine di determinare strategie di trading ottimali in termini
di profitto e di vincoli economici e statistici. Le ragioni che motivano
lo studio di tali strategie ottimali sono:
i) la necessit`a di risolvere i problemi di data-snooping e supervivorship
bias al fine di ottenere regole di investimento vantaggiose utilizzando
indicatori di mercato e/o tecnici per la previsione;
ii) la possibilitĂ di impiegare queste regole come benchmark per sistemi
di trading reali;
iii) la capacit`a di individuare gli asset pi`u vantaggiosi in termini di profitto,
o di altri criteri, rendendo possibile una migliore allocazione di
risorse nei portafogli.
In particolare, nella tesi descrivo due algoritmi che impiegano sistemi di trading
artificiali per predire serie storiche finanziarie e una procedura di calcolo
per strategie integrate neutral market per la gestione attiva di portafogli.
Il primo algoritmo `e una procedura automatica che seleziona le variabili
e simultaneamente determina gli outlier in un modello dinamico lineare
utilizzando criteri informazionali come funzioni obiettivo e test diagnostici
come vincoli per le caratteristiche delle distribuzioni degli errori. Le novit`a
del metodo sono da una parte lâimplementazione automatica di condizioni
econometriche nella fase di selezione, consentendo una migliore analisi dello
EVOLUTIONARY COMPUTATIONS FOR TRADING SYSTEMS 3
spazio delle soluzioni, e dallâaltra parte, lâintroduzione di una procedura di
riduzione evolutiva capace di riconoscere in modo efficiente le variabili pi`u
informative.
Nel secondo algoritmo, le novitĂ sono costituite dalla definizione dellâapprendimento
evolutivo in termini finanziari e dallâapplicazione di un algoritmo
genetico multi-obiettivo per la costruzione di sistemi di trading basati
su indicatori tecnici.
Lâultimo metodo proposto si basa su una strategia di trading su sei assets,
in cui le dinamiche future di ciascuna variabile sono ottenute impiegando
una procedura evolutiva che integra diverse tipologie di variabili finanziarie.
Il contributo è dato dallâimpiego di un algoritmo genetico per ottimizzare i
parametri negli indicatori tecnici e dal modo in cui le differenti informazioni
sono presentate e collegate.
La tesi `e organizzata in tre parti. La prima parte, intitolata Background,
comprende i Capitoli 2 e 3, ed è intesa a fornire unâintroduzione alle tecniche
di ricerca/ottimizzazione su base evolutiva da una parte, e alle teorie
che si occupano di efficienza e prevedibilit`a dei mercati finanziari dallâaltra.
PiĂš precisamente, il Capitolo 2 introduce i concetti base e i maggiori
campi di studio della computazione evolutiva. In tal senso, si dĂ una breve
presentazione storica di tre dei maggiori tipi di algoritmi evolutivi, ciò e le
strategie evolutive, la programmazione evolutiva e gli algoritmi genetici,
evidenziandone caratteri comuni e differenze. Il capitolo si chiude con una
panoramica sugli algoritmi genetici e sulle tecniche classiche e genetiche di
ottimizzazione multi-obiettivo. Il Capitolo 3 affronta nel dettaglio la problematica
della prevedibilit`a delle serie storiche finanziarie mettendo in luce,
in particolare, quanto il paradigma dellâefficienza sia influenzato dalle pi`u
recenti teorie finanziarie, come ad esempio la finanza comportamentale. Lo
scopo è quello di dare una giustificazione su basi teoriche per le metodologie
di previsione sviluppate nella tesi. Segue una descrizione dei metodi
econometrici e di analisi tecnica che nei capitoli successivi verrano impiegati
assieme agli algoritmi evolutivi. Una particolare attenzione è data alle implicazioni
economiche, al fine di evidenziare i loro meriti e i loro difetti da
un punto di vista pratico.
La seconda parte, intitolata Trading Systems, raggruppa i Capitoli 4 e 5 ed
è dedicata alla descrizione di due procedure che ho sviluppato per generare
sistemi di trading artificiali sulla base di algoritmi evolutivi. In particolare,
il Capitolo 4 presenta un algortimo genetico per la selezione di variabili attraverso
la minimizzazione dellâerrore in un modello di regressione multipla.
Misure di errore, quali il ME, il RMSE, il MAE, il coefficiente di Theil e
il CDC sono analizzate a partire da modelli selezionati sulla scorta di criteri
informazionali, come ad esempio AIC, BIC, ICOMP. A livello di vincoli
diagnostici, ho considerato una funzione di penalitĂ a due componenti e la
statistica di Durbin Watson. Il programma impiega variabili finanziarie di
vario tipo, come rendimenti di titoli, bond e prezzi di indici composti ottenuti
dalle economie Statunitense ed Europea. Nel caso le serie storiche
4 MASSIMILIANO KAUCIC
considerate presentino outliers che distorcono lâefficienza e la consistenza
degli stimatori, lâalgoritmo `e in grado di individuarle e rimuoverle dalla serie,
risolvendo il problema di masking and smearing. Il capitolo si conclude
con due applicazioni, in cui i modelli sono progettati per produrre previsioni
di breve periodo per lâextra rendimento del settore MSCI Europe Energy sullâindice
MSCI Europe e una procedura di tipo recursive estimation-window è
utilizzata per evidenziarne le performance previsionali. Nel primo esempio,
lâinsieme dei dati `e ottenuto estraendo le variabili di interesse da un considerevole
numero di indicatori di tipo macro e da variabili finanziarie ritardate
rispetto alla variabile dipendente. Nel secondo esempio ho invece considerato
lâintero insieme di variabili ritardate di 1 mese. I risultati mostrano una
notevole capacitĂ previsiva per lâextra rendimento, individuando gli indicatori
maggiormente informativi. Nel Capitolo 5, il concetto di apprendimento
evolutivo viene definito ed applicato alla costruzione di regole di trading su
indicatori tecnici per lo stock timing. In tal senso, ho sviluppato un algoritmo
che integra metodi di apprendimento statistico e di boostrap con un
particolare algoritmo multi-obiettivo. La procedura derivante è in grado di
valutare contemporaneamente criteri economici e statistici. Per descrivere
il suo funzionamento, ho considerato un semplice esempio di trading in cui
tutto il capitale è investito in un indice (che nel caso trattato è lâindice S&P
500 Composite) o in un titolo a basso rischio (nellâesempio, i Treasury Bills
a 3 mesi). Il segnale finale di trading `e il risultato della selezione degli indicatori
tecnici pi`u informativi a partire da un insieme di circa 5000 indicatori
e la loro conseguente integrazione mediante un metodo di apprendimento
(il plurality voting committee, il bayesian model averaging o il Boosting).
Lâanalisi è stata condotta sullâintervallo temporale dal 2000 al 2006, suddiviso
in tre sottoperiodi: il primo rappresenta lâindice in un
Empirical Analysis of Schemata in Genetic Programming
Schemata and buiding blocks have been used in Genetic Programming
(GP) in several contexts including subroutines, theoretical analysis and
for empirical analysis. Of these three the least explored is empirical analysis.
This thesis presents a powerful GP empirical analysis technique for
analysis of all schemata of a given form occurring in any program of a
given population at scales not previously possible for the kinds of global
analysis performed.
There are many competing GP forms of schema and, rather than choosing
one for analysis, the thesis defines the match-tree meta-form of schema as
a general language expressing forms of schema for use by the analysis system.
This language can express most forms of schema previously used in
tree-based GP.
The new method can perform wide-ranging analyses on the prohibitively
large set of all schemata in the programs by introducing the concepts of
maximal schema, maximal program subset, representative set of schemata, and
representative program subset. These structures are used to optimize the
analysis, shrinking its complexity to a manageable size without sacrificing
the result.
Characterization experiments analyze GP populations of up to 501 60-
node programs, using 11 forms of schema including rooted-hyperschemata
and non-rooted fragments. The new method has close to quadratic complexity
on population size, and quartic complexity on program size. Efficacy
experiments present example analyses using the new method. The
experiments offer interesting insights into the dynamics of GP runs including
fine-grained analysis of convergence and the visualization of schemata
during a GP evolution. Future work will apply the many possible extensions of this new method
to understanding how GP operates, including studies of convergence, building
blocks and schema fitness. This method provides a much finer-resolution
microscope into the inner workings of GP and will be used to provide accessable
visualizations of the evolutionary process
- âŚ