5,000 research outputs found

    Fluctuation scaling in complex systems: Taylor's law and beyond

    Full text link
    Complex systems consist of many interacting elements which participate in some dynamical process. The activity of various elements is often different and the fluctuation in the activity of an element grows monotonically with the average activity. This relationship is often of the form "fluctuations≈const.×averageαfluctuations \approx const.\times average^\alpha", where the exponent α\alpha is predominantly in the range [1/2,1][1/2, 1]. This power law has been observed in a very wide range of disciplines, ranging from population dynamics through the Internet to the stock market and it is often treated under the names \emph{Taylor's law} or \emph{fluctuation scaling}. This review attempts to show how general the above scaling relationship is by surveying the literature, as well as by reporting some new empirical data and model calculations. We also show some basic principles that can underlie the generality of the phenomenon. This is followed by a mean-field framework based on sums of random variables. In this context the emergence of fluctuation scaling is equivalent to some corresponding limit theorems. In certain physical systems fluctuation scaling can be related to finite size scaling.Comment: 33 pages, 20 figures, 2 tables, submitted to Advances in Physic

    Strategic and Selfless Interactions: a study of human behaviour

    Get PDF
    Los seres humanos son animales únicos, cooperando en una escala sin par en cualquier otra especie. Construimos sociedades compuestas de individuos no emparentados, y resultados empíricos nos han demostrado que las personas tienen preferencias sociales y pueden estar dispuestas a tomar acciones costosas que beneficien a otros. Por otro lado, los seres humanos también compiten entre ellos mismos, lo que en ocasiones conlleva consecuencias negativas como la sobreutilización de recursos naturales. Sin embargo, la competición entre agentes económicos subyace el funcionamiento adecuado de los mercados, y su destabilización -- tal como en una distribución desbalanceada de poder de mercado -- puede ser dañina a la eficiencia comercial. Por consiguiente, analizar cómo las personas cooperan y compiten es de importancia primordial para el entendimiento del comportamiento humano, especialmente al considerar los desafíos inminentes que amenazan el bienestar futuro de nuestras sociedades.En esta tesis, se presentan trabajos analizando el comportamiento de las personas en dilemas sociales -- situaciones en las cuales decisiones egoístas discrepan del optimo social -- y en otros escenarios estratégicos. Utilizando el framework de la teoría de juegos, sus interacciones tienen lugar en juegos abstrayendo estas situaciones. Específicamente, realizamos experimentos conductuales en los cuales las personas participaron en juegos adaptados de recursos comunes, de bienes públicos y otros juegos hechos a medida. Además, con la intención de comprender la existencia de la cooperación en humanos, proponemos un enfoque teórico para modelar su evolución a través de una dinámica de selección de heurísticas.Empezamos presentando los fundamentos teóricos y empíricos en los que se basa esta tesis, a saber, la teoría de juegos, la economía experimental, la ciencia de redes y la evolución de la cooperación. Posteriormente, ilustramos los aspectos prácticos de la realización de experimentos mediante implementaciones de software.Para comprender el comportamiento de las personas en problemas de acción colectiva -- como la mitigación del cambio climático, que requiere un nivel global de coordinación y cooperación -- realizamos juegos de bienes públicos y recursos comunes entre participantes chinos y españoles. Los resultados obtenidos proporcionan algunas ideas sobre las variaciones y universalidades de las respuestas de las personas en estos escenarios.En esta línea, durante los últimos años, las personas e instituciones están cada vez más preocupadas por los temas sociales y ambientales. Sin embargo, las contribuciones en estos escenarios requieren un nivel sustancial de altruismo por parte de los agentes que tienen que tomar decisiones costosas. Realizamos dos experimentos para comprender los factores que impulsan dichas decisiones en dos situaciones de relevancia contemporánea: las donaciones benéficas y las inversiones socialmente responsables. Sus resultados indican que el encuadre y otras características sociodemográficas están asociadas significativamente con decisiones prosociales y altruistas.Además, también hemos analizado el comportamiento de las personas en un escenario competitivo y complejo en el cual los sujetos participaron como intermediarios en experimentos de formación de precios. Lo hacemos a través de un experimento que implementa en redes complejas una generalización del juego de negociación. Nuestros hallazgos indican efectos significativos de la topología de la red tanto en resultados experimentales como también en modelos teóricos basados en el comportamiento observado.Por último, exponemos un trabajo teórico que intenta comprender el surgimiento de la cooperación a través de un enfoque novedoso para estudiar la evolución de estrategias en poblaciones estructuradas. Esto se logra modelando las decisiones de los agentes como resultados de heurísticas, siendo estas heurísticas seleccionadas mediante un proceso inspirado en los algoritmos evolutivos. Nuestros análisis muestran que, cuando estos agentes tienen memoria de sus interacciones anteriores, las estrategias cooperativas prosperarán. Sin embargo, esas estrategias funcionarán de acuerdo con diferentes heurísticas según la información que tomen en consideración.<br /

    A Field Guide to Genetic Programming

    Get PDF
    xiv, 233 p. : il. ; 23 cm.Libro ElectrónicoA Field Guide to Genetic Programming (ISBN 978-1-4092-0073-4) is an introduction to genetic programming (GP). GP is a systematic, domain-independent method for getting computers to solve problems automatically starting from a high-level statement of what needs to be done. Using ideas from natural evolution, GP starts from an ooze of random computer programs, and progressively refines them through processes of mutation and sexual recombination, until solutions emerge. All this without the user having to know or specify the form or structure of solutions in advance. GP has generated a plethora of human-competitive results and applications, including novel scientific discoveries and patentable inventions. The authorsIntroduction -- Representation, initialisation and operators in Tree-based GP -- Getting ready to run genetic programming -- Example genetic programming run -- Alternative initialisations and operators in Tree-based GP -- Modular, grammatical and developmental Tree-based GP -- Linear and graph genetic programming -- Probalistic genetic programming -- Multi-objective genetic programming -- Fast and distributed genetic programming -- GP theory and its applications -- Applications -- Troubleshooting GP -- Conclusions.Contents xi 1 Introduction 1.1 Genetic Programming in a Nutshell 1.2 Getting Started 1.3 Prerequisites 1.4 Overview of this Field Guide I Basics 2 Representation, Initialisation and GP 2.1 Representation 2.2 Initialising the Population 2.3 Selection 2.4 Recombination and Mutation Operators in Tree-based 3 Getting Ready to Run Genetic Programming 19 3.1 Step 1: Terminal Set 19 3.2 Step 2: Function Set 20 3.2.1 Closure 21 3.2.2 Sufficiency 23 3.2.3 Evolving Structures other than Programs 23 3.3 Step 3: Fitness Function 24 3.4 Step 4: GP Parameters 26 3.5 Step 5: Termination and solution designation 27 4 Example Genetic Programming Run 4.1 Preparatory Steps 29 4.2 Step-by-Step Sample Run 31 4.2.1 Initialisation 31 4.2.2 Fitness Evaluation Selection, Crossover and Mutation Termination and Solution Designation Advanced Genetic Programming 5 Alternative Initialisations and Operators in 5.1 Constructing the Initial Population 5.1.1 Uniform Initialisation 5.1.2 Initialisation may Affect Bloat 5.1.3 Seeding 5.2 GP Mutation 5.2.1 Is Mutation Necessary? 5.2.2 Mutation Cookbook 5.3 GP Crossover 5.4 Other Techniques 32 5.5 Tree-based GP 39 6 Modular, Grammatical and Developmental Tree-based GP 47 6.1 Evolving Modular and Hierarchical Structures 47 6.1.1 Automatically Defined Functions 48 6.1.2 Program Architecture and Architecture-Altering 50 6.2 Constraining Structures 51 6.2.1 Enforcing Particular Structures 52 6.2.2 Strongly Typed GP 52 6.2.3 Grammar-based Constraints 53 6.2.4 Constraints and Bias 55 6.3 Developmental Genetic Programming 57 6.4 Strongly Typed Autoconstructive GP with PushGP 59 7 Linear and Graph Genetic Programming 61 7.1 Linear Genetic Programming 61 7.1.1 Motivations 61 7.1.2 Linear GP Representations 62 7.1.3 Linear GP Operators 64 7.2 Graph-Based Genetic Programming 65 7.2.1 Parallel Distributed GP (PDGP) 65 7.2.2 PADO 67 7.2.3 Cartesian GP 67 7.2.4 Evolving Parallel Programs using Indirect Encodings 68 8 Probabilistic Genetic Programming 8.1 Estimation of Distribution Algorithms 69 8.2 Pure EDA GP 71 8.3 Mixing Grammars and Probabilities 74 9 Multi-objective Genetic Programming 75 9.1 Combining Multiple Objectives into a Scalar Fitness Function 75 9.2 Keeping the Objectives Separate 76 9.2.1 Multi-objective Bloat and Complexity Control 77 9.2.2 Other Objectives 78 9.2.3 Non-Pareto Criteria 80 9.3 Multiple Objectives via Dynamic and Staged Fitness Functions 80 9.4 Multi-objective Optimisation via Operator Bias 81 10 Fast and Distributed Genetic Programming 83 10.1 Reducing Fitness Evaluations/Increasing their Effectiveness 83 10.2 Reducing Cost of Fitness with Caches 86 10.3 Parallel and Distributed GP are Not Equivalent 88 10.4 Running GP on Parallel Hardware 89 10.4.1 Master–slave GP 89 10.4.2 GP Running on GPUs 90 10.4.3 GP on FPGAs 92 10.4.4 Sub-machine-code GP 93 10.5 Geographically Distributed GP 93 11 GP Theory and its Applications 97 11.1 Mathematical Models 98 11.2 Search Spaces 99 11.3 Bloat 101 11.3.1 Bloat in Theory 101 11.3.2 Bloat Control in Practice 104 III Practical Genetic Programming 12 Applications 12.1 Where GP has Done Well 12.2 Curve Fitting, Data Modelling and Symbolic Regression 12.3 Human Competitive Results – the Humies 12.4 Image and Signal Processing 12.5 Financial Trading, Time Series, and Economic Modelling 12.6 Industrial Process Control 12.7 Medicine, Biology and Bioinformatics 12.8 GP to Create Searchers and Solvers – Hyper-heuristics xiii 12.9 Entertainment and Computer Games 127 12.10The Arts 127 12.11Compression 128 13 Troubleshooting GP 13.1 Is there a Bug in the Code? 13.2 Can you Trust your Results? 13.3 There are No Silver Bullets 13.4 Small Changes can have Big Effects 13.5 Big Changes can have No Effect 13.6 Study your Populations 13.7 Encourage Diversity 13.8 Embrace Approximation 13.9 Control Bloat 13.10 Checkpoint Results 13.11 Report Well 13.12 Convince your Customers 14 Conclusions Tricks of the Trade A Resources A.1 Key Books A.2 Key Journals A.3 Key International Meetings A.4 GP Implementations A.5 On-Line Resources 145 B TinyGP 151 B.1 Overview of TinyGP 151 B.2 Input Data Files for TinyGP 153 B.3 Source Code 154 B.4 Compiling and Running TinyGP 162 Bibliography 167 Inde

    The impact of news narrative on the economy and financial markets

    Get PDF
    This thesis investigates the impact of news narrative on socio-economic systems across four experiments. Recent years have witnessed a rise in the use of so-called alternative data sources to model and predict dynamics in socio-economic systems. Notably, sources such as newspaper text allow researchers to quantify the elusive concept of narrative, to incorporate text-based features into forecasting frameworks and thus to evaluate the impact of narrative on economic events. The first experiment proposes a new method of incorporating a wide array of sentiment scores from global newspaper articles into macroeconomic forecasts, attempting to forecast industrial production and consumer prices leveraging narrative and sentiment from global newspapers. I model industrial production and consumer prices across a diverse range of economies using an autoregressive framework. The second experiment uses narrative from global newspapers to construct themebased knowledge graphs about world events, demonstrating that features extracted from such graphs improve forecasts of industrial production in three large economies. The third experiment proposes a novel method of including news themes and their associated sentiment into predictions of changes in breakeven inflation rates (BEIR) for eight diverse economies with mature fixed income markets. I utilise five types of machine learning algorithms incorporating narrative-based features for each economy. In the above experiments, models incorporating narrative-based features generally outperform their benchmarks that do not contain such variables, demonstrating the predictive power of features derived from news narrative. The fourth experiment utilises GDELT data and the filtering methodology introduced in the first experiment to create a profitable systematic trading strategy based on the average tone scores for 15 diverse economies

    Integrating host population contact structure and pathogen whole-genome sequence data to understand the epidemiology of infectious diseases : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy, Massey University, Manawatū, New Zealand

    Get PDF
    With advances in high-throughput sequencing technologies, computational biology, and evolutionary modelling, pathogen sequence data is increasingly being used to inform infectious disease outbreak investigations; supporting inferences on the timing and directionality of transmission as well as providing insights into pathogen evolutionary dynamics and the development of antimicrobial resistance. This thesis focuses on the application of pathogen whole-genome sequence data in conjunction with social network analysis to investigate the transmission dynamics of two important pathogens; Campylobacter jejuni and Staphylococcus aureus. The first four studies centre around the recent emergence of an antimicrobial-resistant C. jejuni strain that was found to have rapidly spread throughout the New Zealand commercial poultry industry. All four studies build on the results of an industry survey that were not only used to determine the basic farm demographics and biosecurity practices of all poultry producers, but also to construct five contact networks representing the on- and off-farm movement patterns of goods and services. Contact networks were used in study one to investigate the relationship between farm-level contact risk pathways and the reported level of biosecurity. However, despite many farms having a number of contact risk pathways, no relationship was found due to the high level of variability in biosecurity practices between producers. In study two the contact risk between commercial poultry, backyard poultry, and wild birds was investigated by examining the spatial overlap between the commercial contact networks and (i) all poultry transactions made through the online auction website TradeMe® and, (ii) all wild bird observations made through the online citizen science bird monitoring project, eBird, with study results suggesting that the greatest risk is due to the growing number of online trades made over increasingly long distances and shorter timespans. Study three further uses the commercial contact networks to investigate the role of multiple transmission pathways on the genetic relatedness of 167 C. jejuni isolates sampled from across 30 commercial poultry farms. Permutational multivariate analysis of variance and distance-based linear models were used to explore the relative importance of network distances as potential determinants of the pairwise genetic relatedness between the C. jejuni isolates, with study results highlighting the importance of transporting feed vehicles in addition to the geographical proximity of farms and the parent company in the spread of disease. In the last of the four C. jejuni studies, a compartmental disease transmission model was developed to simulate both the spread and sequence mutations across an outbreak within the commercial poultry industry. Simulated sequences were used in an analysis mirroring the methods used in study three in order to validate the approaches examining the contribution of local contacts and network contacts towards disease transmission. An additional analysis is also performed in which the simulated sequence data is used to infer a transmission tree and explore the use of pathogen phylogenies in determining who-infected-whom across different model systems. A further study, motivated by the application of whole-genome sequence data to infer transmission, investigated the spread of S. aureus within the New Zealand dairy industry. This study demonstrated how whole-genome sequence data can be used to investigate pathogen population and evolutionary dynamics at multiple scales: from local to national and international. For this study, the genetic relatedness between 57 bovine-derived S. aureus isolates sampled from across 17 New Zealand dairy herds were compared with 59 S. aureus isolates that had been previously sampled and characterised from humans and domestic pets from across New Zealand and 103 S. aureus isolates extracted from GenBank that included both human and livestock isolates sampled from across 19 countries. Results from this study not only support evidence showing that the movement of live animals is an important risk factor for the spread of S. aureus, but also show that using cattle-tracing data alone may not be enough to fully capture the between farm transmission dynamics of S. aureus. Overall, by using these two pathogen examples, this thesis demonstrates the potential use of pathogen whole-genome sequence data alongside contact network data in an epidemiological investigation, whilst highlighting the limitations and future challenges that must be considered in order to continue to develop robust methods that can be used to reliably infer the transmission and evolutionary dynamics across a range of infectious diseases

    Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost

    Get PDF
    The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off. We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training. To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance. We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets. To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude
    • …
    corecore