5,000 research outputs found
Fluctuation scaling in complex systems: Taylor's law and beyond
Complex systems consist of many interacting elements which participate in
some dynamical process. The activity of various elements is often different and
the fluctuation in the activity of an element grows monotonically with the
average activity. This relationship is often of the form "", where the exponent is predominantly in
the range . This power law has been observed in a very wide range of
disciplines, ranging from population dynamics through the Internet to the stock
market and it is often treated under the names \emph{Taylor's law} or
\emph{fluctuation scaling}. This review attempts to show how general the above
scaling relationship is by surveying the literature, as well as by reporting
some new empirical data and model calculations. We also show some basic
principles that can underlie the generality of the phenomenon. This is followed
by a mean-field framework based on sums of random variables. In this context
the emergence of fluctuation scaling is equivalent to some corresponding limit
theorems. In certain physical systems fluctuation scaling can be related to
finite size scaling.Comment: 33 pages, 20 figures, 2 tables, submitted to Advances in Physic
Strategic and Selfless Interactions: a study of human behaviour
Los seres humanos son animales únicos, cooperando en una escala sin par en cualquier otra especie. Construimos sociedades compuestas de individuos no emparentados, y resultados empÃricos nos han demostrado que las personas tienen preferencias sociales y pueden estar dispuestas a tomar acciones costosas que beneficien a otros. Por otro lado, los seres humanos también compiten entre ellos mismos, lo que en ocasiones conlleva consecuencias negativas como la sobreutilización de recursos naturales. Sin embargo, la competición entre agentes económicos subyace el funcionamiento adecuado de los mercados, y su destabilización -- tal como en una distribución desbalanceada de poder de mercado -- puede ser dañina a la eficiencia comercial. Por consiguiente, analizar cómo las personas cooperan y compiten es de importancia primordial para el entendimiento del comportamiento humano, especialmente al considerar los desafÃos inminentes que amenazan el bienestar futuro de nuestras sociedades.En esta tesis, se presentan trabajos analizando el comportamiento de las personas en dilemas sociales -- situaciones en las cuales decisiones egoÃstas discrepan del optimo social -- y en otros escenarios estratégicos. Utilizando el framework de la teorÃa de juegos, sus interacciones tienen lugar en juegos abstrayendo estas situaciones. EspecÃficamente, realizamos experimentos conductuales en los cuales las personas participaron en juegos adaptados de recursos comunes, de bienes públicos y otros juegos hechos a medida. Además, con la intención de comprender la existencia de la cooperación en humanos, proponemos un enfoque teórico para modelar su evolución a través de una dinámica de selección de heurÃsticas.Empezamos presentando los fundamentos teóricos y empÃricos en los que se basa esta tesis, a saber, la teorÃa de juegos, la economÃa experimental, la ciencia de redes y la evolución de la cooperación. Posteriormente, ilustramos los aspectos prácticos de la realización de experimentos mediante implementaciones de software.Para comprender el comportamiento de las personas en problemas de acción colectiva -- como la mitigación del cambio climático, que requiere un nivel global de coordinación y cooperación -- realizamos juegos de bienes públicos y recursos comunes entre participantes chinos y españoles. Los resultados obtenidos proporcionan algunas ideas sobre las variaciones y universalidades de las respuestas de las personas en estos escenarios.En esta lÃnea, durante los últimos años, las personas e instituciones están cada vez más preocupadas por los temas sociales y ambientales. Sin embargo, las contribuciones en estos escenarios requieren un nivel sustancial de altruismo por parte de los agentes que tienen que tomar decisiones costosas. Realizamos dos experimentos para comprender los factores que impulsan dichas decisiones en dos situaciones de relevancia contemporánea: las donaciones benéficas y las inversiones socialmente responsables. Sus resultados indican que el encuadre y otras caracterÃsticas sociodemográficas están asociadas significativamente con decisiones prosociales y altruistas.Además, también hemos analizado el comportamiento de las personas en un escenario competitivo y complejo en el cual los sujetos participaron como intermediarios en experimentos de formación de precios. Lo hacemos a través de un experimento que implementa en redes complejas una generalización del juego de negociación. Nuestros hallazgos indican efectos significativos de la topologÃa de la red tanto en resultados experimentales como también en modelos teóricos basados en el comportamiento observado.Por último, exponemos un trabajo teórico que intenta comprender el surgimiento de la cooperación a través de un enfoque novedoso para estudiar la evolución de estrategias en poblaciones estructuradas. Esto se logra modelando las decisiones de los agentes como resultados de heurÃsticas, siendo estas heurÃsticas seleccionadas mediante un proceso inspirado en los algoritmos evolutivos. Nuestros análisis muestran que, cuando estos agentes tienen memoria de sus interacciones anteriores, las estrategias cooperativas prosperarán. Sin embargo, esas estrategias funcionarán de acuerdo con diferentes heurÃsticas según la información que tomen en consideración.<br /
A Field Guide to Genetic Programming
xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction --
Representation, initialisation and operators in Tree-based GP --
Getting ready to run genetic programming --
Example genetic programming run --
Alternative initialisations and operators in Tree-based GP --
Modular, grammatical and developmental Tree-based GP --
Linear and graph genetic programming --
Probalistic genetic programming --
Multi-objective genetic programming --
Fast and distributed genetic programming --
GP theory and its applications --
Applications --
Troubleshooting GP --
Conclusions.Contents
xi
1 Introduction
1.1 Genetic Programming in a Nutshell
1.2 Getting Started
1.3 Prerequisites
1.4 Overview of this Field Guide I
Basics
2 Representation, Initialisation and GP
2.1 Representation
2.2 Initialising the Population
2.3 Selection
2.4 Recombination and Mutation Operators in Tree-based
3 Getting Ready to Run Genetic Programming 19
3.1 Step 1: Terminal Set 19
3.2 Step 2: Function Set 20
3.2.1 Closure 21
3.2.2 Sufficiency 23
3.2.3 Evolving Structures other than Programs 23
3.3 Step 3: Fitness Function 24
3.4 Step 4: GP Parameters 26
3.5 Step 5: Termination and solution designation 27
4 Example Genetic Programming Run
4.1 Preparatory Steps 29
4.2 Step-by-Step Sample Run 31
4.2.1 Initialisation 31
4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming
5 Alternative Initialisations and Operators in
5.1 Constructing the Initial Population
5.1.1 Uniform Initialisation
5.1.2 Initialisation may Affect Bloat
5.1.3 Seeding
5.2 GP Mutation
5.2.1 Is Mutation Necessary?
5.2.2 Mutation Cookbook
5.3 GP Crossover
5.4 Other Techniques 32
5.5 Tree-based GP 39
6 Modular, Grammatical and Developmental Tree-based GP 47
6.1 Evolving Modular and Hierarchical Structures 47
6.1.1 Automatically Defined Functions 48
6.1.2 Program Architecture and Architecture-Altering 50
6.2 Constraining Structures 51
6.2.1 Enforcing Particular Structures 52
6.2.2 Strongly Typed GP 52
6.2.3 Grammar-based Constraints 53
6.2.4 Constraints and Bias 55
6.3 Developmental Genetic Programming 57
6.4 Strongly Typed Autoconstructive GP with PushGP 59
7 Linear and Graph Genetic Programming 61
7.1 Linear Genetic Programming 61
7.1.1 Motivations 61
7.1.2 Linear GP Representations 62
7.1.3 Linear GP Operators 64
7.2 Graph-Based Genetic Programming 65
7.2.1 Parallel Distributed GP (PDGP) 65
7.2.2 PADO 67
7.2.3 Cartesian GP 67
7.2.4 Evolving Parallel Programs using Indirect Encodings 68
8 Probabilistic Genetic Programming
8.1 Estimation of Distribution Algorithms 69
8.2 Pure EDA GP 71
8.3 Mixing Grammars and Probabilities 74
9 Multi-objective Genetic Programming 75
9.1 Combining Multiple Objectives into a Scalar Fitness Function 75
9.2 Keeping the Objectives Separate 76
9.2.1 Multi-objective Bloat and Complexity Control 77
9.2.2 Other Objectives 78
9.2.3 Non-Pareto Criteria 80
9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80
9.4 Multi-objective Optimisation via Operator Bias 81
10 Fast and Distributed Genetic Programming 83
10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83
10.2 Reducing Cost of Fitness with Caches 86
10.3 Parallel and Distributed GP are Not Equivalent 88
10.4 Running GP on Parallel Hardware 89
10.4.1 Master–slave GP 89
10.4.2 GP Running on GPUs 90
10.4.3 GP on FPGAs 92
10.4.4 Sub-machine-code GP 93
10.5 Geographically Distributed GP 93
11 GP Theory and its Applications 97
11.1 Mathematical Models 98
11.2 Search Spaces 99
11.3 Bloat 101
11.3.1 Bloat in Theory 101
11.3.2 Bloat Control in Practice 104
III
Practical Genetic Programming
12 Applications
12.1 Where GP has Done Well
12.2 Curve Fitting, Data Modelling and Symbolic Regression
12.3 Human Competitive Results – the Humies
12.4 Image and Signal Processing
12.5 Financial Trading, Time Series, and Economic Modelling
12.6 Industrial Process Control
12.7 Medicine, Biology and Bioinformatics
12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii
12.9 Entertainment and Computer Games 127
12.10The Arts 127
12.11Compression 128
13 Troubleshooting GP
13.1 Is there a Bug in the Code?
13.2 Can you Trust your Results?
13.3 There are No Silver Bullets
13.4 Small Changes can have Big Effects
13.5 Big Changes can have No Effect
13.6 Study your Populations
13.7 Encourage Diversity
13.8 Embrace Approximation
13.9 Control Bloat
13.10 Checkpoint Results
13.11 Report Well
13.12 Convince your Customers
14 Conclusions
Tricks of the Trade
A Resources
A.1 Key Books
A.2 Key Journals
A.3 Key International Meetings
A.4 GP Implementations
A.5 On-Line Resources 145
B TinyGP 151
B.1 Overview of TinyGP 151
B.2 Input Data Files for TinyGP 153
B.3 Source Code 154
B.4 Compiling and Running TinyGP 162
Bibliography 167
Inde
The impact of news narrative on the economy and financial markets
This thesis investigates the impact of news narrative on socio-economic systems across four experiments. Recent years have witnessed a rise in the use of so-called alternative data sources to model and predict dynamics in socio-economic systems. Notably, sources such as newspaper text allow researchers to quantify the elusive concept of narrative, to incorporate text-based features into forecasting frameworks and thus to evaluate the impact of narrative on economic events.
The first experiment proposes a new method of incorporating a wide array of sentiment scores from global newspaper articles into macroeconomic forecasts, attempting to forecast industrial production and consumer prices leveraging narrative and sentiment from global newspapers. I model industrial production and consumer prices across a diverse range of economies using an autoregressive framework.
The second experiment uses narrative from global newspapers to construct themebased knowledge graphs about world events, demonstrating that features extracted from such graphs improve forecasts of industrial production in three large economies.
The third experiment proposes a novel method of including news themes and their associated sentiment into predictions of changes in breakeven inflation rates (BEIR) for eight diverse economies with mature fixed income markets. I utilise five types of machine learning algorithms incorporating narrative-based features for each economy.
In the above experiments, models incorporating narrative-based features generally outperform their benchmarks that do not contain such variables, demonstrating the predictive power of features derived from news narrative.
The fourth experiment utilises GDELT data and the filtering methodology introduced in the first experiment to create a profitable systematic trading strategy based on the average tone scores for 15 diverse economies
An analysis of spending behaviour under liquidity constraints with an application to financial hedging
Imperial Users onl
Integrating host population contact structure and pathogen whole-genome sequence data to understand the epidemiology of infectious diseases : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy, Massey University, Manawatū, New Zealand
With advances in high-throughput sequencing technologies, computational biology, and evolutionary modelling, pathogen sequence data is increasingly being used to inform infectious disease outbreak investigations; supporting inferences on the timing and directionality of transmission as well as providing insights into pathogen evolutionary dynamics and the development of antimicrobial resistance. This thesis focuses on the application of pathogen whole-genome sequence data in conjunction with social network analysis to investigate the transmission dynamics of two important pathogens; Campylobacter jejuni and Staphylococcus aureus.
The first four studies centre around the recent emergence of an antimicrobial-resistant C. jejuni strain that was found to have rapidly spread throughout the New Zealand commercial poultry industry. All four studies build on the results of an industry survey that were not only used to determine the basic farm demographics and biosecurity practices of all poultry producers, but also to construct five contact networks representing the on- and off-farm movement patterns of goods and services. Contact networks were used in study one to investigate the relationship between farm-level contact risk pathways and the reported level of biosecurity. However, despite many farms having a number of contact risk pathways, no relationship was found due to the high level of variability in biosecurity practices between producers.
In study two the contact risk between commercial poultry, backyard poultry, and wild birds was investigated by examining the spatial overlap between the commercial contact networks and (i) all poultry transactions made through the online auction website TradeMe® and, (ii) all wild bird observations made through the online citizen science bird monitoring project, eBird, with study results suggesting that the greatest risk is due to the growing number of online trades made over increasingly long distances and shorter timespans.
Study three further uses the commercial contact networks to investigate the role of multiple transmission pathways on the genetic relatedness of 167 C. jejuni isolates sampled from across 30 commercial poultry farms. Permutational multivariate analysis of variance and distance-based linear models were used to explore the relative importance of network distances as potential determinants of the pairwise genetic relatedness between the C. jejuni isolates, with study results highlighting the importance of transporting feed vehicles in addition to the geographical proximity of farms and the parent company in the spread of disease.
In the last of the four C. jejuni studies, a compartmental disease transmission model was developed to simulate both the spread and sequence mutations across an outbreak within the commercial poultry industry. Simulated sequences were used in an analysis mirroring the methods used in study three in order to validate the approaches examining the contribution of local contacts and network contacts towards disease transmission. An additional analysis is also performed in which the simulated sequence data is used to infer a transmission tree and explore the use of pathogen phylogenies in determining who-infected-whom across different model systems.
A further study, motivated by the application of whole-genome sequence data to infer transmission, investigated the spread of S. aureus within the New Zealand dairy industry. This study demonstrated how whole-genome sequence data can be used to investigate pathogen population and evolutionary dynamics at multiple scales: from local to national and international. For this study, the genetic relatedness between 57 bovine-derived S. aureus isolates sampled from across 17 New Zealand dairy herds were compared with 59 S. aureus isolates that had been previously sampled and characterised from humans and domestic pets from across New Zealand and 103 S. aureus isolates extracted from GenBank that included both human and livestock isolates sampled from across 19 countries. Results from this study not only support evidence showing that the movement of live animals is an important risk factor for the spread of S. aureus, but also show that using cattle-tracing data alone may not be enough to fully capture the between farm transmission dynamics of S. aureus.
Overall, by using these two pathogen examples, this thesis demonstrates the potential use of pathogen whole-genome sequence data alongside contact network data in an epidemiological investigation, whilst highlighting the limitations and future challenges that must be considered in order to continue to develop robust methods that can be used to reliably infer the transmission and evolutionary dynamics across a range of infectious diseases
Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost
The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off.
We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training.
To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance.
We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets.
To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude
- …