7 research outputs found

    A scalable system for factored learning in the cloud

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 79-81).This work presents FlexGP, a new system designed for scalable machine learning in the cloud. FlexGP presents a learner-agnostic, data-parallel approach to cloud-based distributed learning using existing single-machine algorithms, without any dependence on distributed file systems or shared memory between instances. We design and implement asynchronous and decentralized launch and peer discovery protocols to start and configure a distributed network of learners. Through a unique process of factoring the data and parameters across the learners, FlexGP ensures this network consists of heterogeneous learners producing diverse models. These models are then filtered and fused to produce a meta-model for prediction. Using a thoughtfully designed test framework, FlexGP is run on a real-world regression problem from a large database. The results demonstrate the reliability and robustness of the system, even when learning from very little training data and multiple factorings, and demonstrate FlexGP as a vital tool to effectively leverage the cloud for machine learning tasks.by Owen C. Derby.M. Eng

    Multiple levels of parallelism in distributed machine learning via genetic programming

    Get PDF
    Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 105-107).This thesis presents FlexGP 2.0, a distributed cloud-backed machine learning system. FlexGP 2.0 features multiple levels of parallelism which provide a significant improvement in accuracy v.s. elapsed time. The amount of computational resources in FlexGP 2.0 can be scaled along several dimensions to support large, complex data. FlexGP 2.0's core genetic programming (GP) learner includes multithreaded C++ model evaluation and a multi-objective optimization algorithm which is extensible to pursue any number of objectives simultaneously in parallel. FlexGP 2.0 parallelizes the entire learner to obtain a large distributed population size and leverages communication between learners to increase performance via transferral of search progress between learners. FlexGP 2.0 factors training data to boost performance and enable support for increased data size and complexity. Several experiments are performed which verify the efficacy of FlexGP 2.0's multilevel parallelism. Experiments run on a large dataset from a real-world regression problem. The results demonstrate both less time to achieve the same accuracy and overall increased accuracy, and illustrate the value of FlexGP 2.0 as a platform for machine learning.by Dylan J. Sherry.M. Eng

    The Seamless Peer and Cloud Evolution Framework

    Get PDF
    Evolutionary algorithms are increasingly being applied to problems that are too computationally expensive to run on a single personal computer due to costly fitness function evaluations and/or large numbers of fitness evaluations. Here, we introduce the Seamless Peer And Cloud Evolution (SPACE) framework, which leverages bleeding edge web technologies to allow the computational resources necessary for running large scale evolutionary experiments to be made available to amateur and professional researchers alike, in a scalable and cost-effective manner, directly from their web browsers. The SPACE framework accomplishes this by distributing fitness evaluations across a heterogeneous pool of cloud compute nodes and peer computers. As a proof of concept, this framework has been attached to the RoboGen open-source platform for the co-evolution of robot bodies and brains, but importantly the framework has been built in a modular fashion such that it can be easily coupled with other evolutionary computation systems

    Machine Learning Technologies and Their Applications for Science and Engineering Domains Workshop -- Summary Report

    Get PDF
    The fields of machine learning and big data analytics have made significant advances in recent years, which has created an environment where cross-fertilization of methods and collaborations can achieve previously unattainable outcomes. The Comprehensive Digital Transformation (CDT) Machine Learning and Big Data Analytics team planned a workshop at NASA Langley in August 2016 to unite leading experts the field of machine learning and NASA scientists and engineers. The primary goal for this workshop was to assess the state-of-the-art in this field, introduce these leading experts to the aerospace and science subject matter experts, and develop opportunities for collaboration. The workshop was held over a three day-period with lectures from 15 leading experts followed by significant interactive discussions. This report provides an overview of the 15 invited lectures and a summary of the key discussion topics that arose during both formal and informal discussion sections. Four key workshop themes were identified after the closure of the workshop and are also highlighted in the report. Furthermore, several workshop attendees provided their feedback on how they are already utilizing machine learning algorithms to advance their research, new methods they learned about during the workshop, and collaboration opportunities they identified during the workshop

    Parallel genetic algorithms in the cloud

    Get PDF
    2015 - 2016Genetic Algorithms (GAs) are a metaheuristic search technique belonging to the class of Evolutionary Algorithms (EAs). They have been proven to be effective in addressing several problems in many fields but also suffer from scalability issues that may not let them find a valid application for real world problems. Thus, the aim of providing highly scalable GA-based solutions, together with the reduced costs of parallel architectures, motivate the research on Parallel Genetic Algorithms (PGAs). Cloud computing may be a valid option for parallelisation, since there is no need of owning the physical hardware, which can be purchased from cloud providers, for the desired time, quantity and quality. There are different employable cloud technologies and approaches for this purpose, but they all introduce communication overhead. Thus, one might wonder if, and possibly when, specific approaches, environments and models show better performance than sequential versions in terms of execution time and resource usage. This thesis investigates if and when GAs can scale in the cloud using specific approaches. Firstly, Hadoop MapReduce is exploited designing and developinganopensourceframework,i.e.,elephant56, thatreducestheeffortin developing and speed up GAs using three parallel models. The performance of theframeworkisthenevaluatedthroughanempiricalstudy. Secondly, software containers and message queues are employed to develop, deploy and execute PGAs in the cloud and the devised system is evaluated with an empirical study on a commercial cloud provider. Finally, cloud technologies are also exploredfortheparallelisationofotherEAs,designinganddevelopingcCube,a collaborativemicroservicesarchitectureformachinelearningproblems. [edited by author]I Genetic Algorithms (GAs) sono una metaeuristica di ricerca appartenenti alla classe degli Evolutionary Algorithms (EAs). Si sono dimostrati efficaci nel risolvere tanti problemi in svariati campi. Tuttavia, le difficoltà nello scalare spesso evitano che i GAs possano trovare una collocazione efficace per la risoluzione di problemi del mondo reale. Quindi, l’obiettivo di fornire soluzioni basate altamente scalabili, assieme alla riduzione dei costi di architetture parallele, motivano la ricerca sui Parallel Genetic Algorithms (PGAs). Il cloud computing potrebbe essere una valida opzione per la parallelizzazione, dato che non c’è necessità di possedere hardware fisico che può, invece, essere acquistato dai cloud provider, per il tempo desiderato, quantità e qualità. Esistono differenti tecnologie e approcci cloud impiegabili a tal proposito ma, tutti, introducono overhead di computazione. Quindi, ci si può chiedere se, e possibilmente quando, approcci specifici, ambienti e modelli mostrino migliori performance rispetto alle versioni sequenziali, in termini di tempo di esecuzione e uso di risorse. Questa tesi indaga se, e quando, i GAs possono scalare nel cloud utilizzando approcci specifici. Prima di tutto, Hadoop MapReduce è sfruttato per modellare e sviluppare un framework open source, i.e., elephant56, che riduce l’effort nello sviluppo e velocizza i GAs usando tre diversi modelli paralleli. Le performance del framework sono poi valutate attraverso uno studio empirico. Successivamente, i software container e le message queue sono impiegati per sviluppare, distribuire e eseguire PGAs e il sistema ideato valutato, attraverso uno studio empirico, su un cloud provider commerciale. Infine, le tecnologie cloud sono esplorate per la parallelizzazione di altri EAs, ideando e sviluppando cCube, un’architettura a microservizi collaborativa per risolvere problemi di machine learning. [a cura dell'autore]XV n.s

    Evolutionary Model Discovery: Automating Causal Inference for Generative Models of Human Social Behavior

    Get PDF
    The desire to understand the causes of complex societal phenomena is fundamental to the social sciences. Society, at a macro-scale has many measurable characteristics in the form of statistical distributions and aggregate measures; data which is increasingly abundant with the proliferation of online social media, mobile devices, and the internet of things. However, the decision-making processes and limits of the individuals who interact to generate these statistical patterns are often difficult to unravel. Furthermore, multiple causal factors often interact to determine the outcome of a particular behavior. Quantifying the importance of these causal factors and their interactions, which make up a particular decision-making process, towards a societal outcome of interest helps extract explanations that provide a deeper understanding of social behavior. Holistic, generative modeling techniques, in particular agent-based modeling, are able to \u27grow\u27 artificial societies that replicate emergent patterns seen in the real world. Driving the autonomous agents of these models are rules, generalized hypotheses of human behavior, which upon validation against real-world data, help assemble theories of human behavior. Yet often, multiple hypothetical causal factors can be suggested for the construction of these rules. With traditional agent-based modeling, it is often up to the modeler\u27s discretion to decide which combination of factors best represent the rule at hand. Yet, due to the aforementioned lack of insight, the modeled agent rule is often one out of a vast space of possible rules. In this dissertation, I introduce Evolutionary Model Discovery, a novel framework for automated causal inference, which treats such artificial societies as sandboxes for rule discovery and causal factor importance evaluation. Evolutionary Model Discovery consists of two major phases. Firstly, a rule of interest of a given agent-based model is genetically programmed with combinations of hypothesized factors, attempting to find rules which enable the agent-based model to more closely mimic real-world phenomena. Secondly, the data produced through genetic programming, regarding the correspondence of factor presence in the rule to fitness, is used to train a random forest regressor for importance evaluation. Besides its scientific contributions, this work has also led to the contribution of two Python open-source software libraries for high performance computing with NetLogo, Evolutionary Model Discovery and NL4Py. The results of applying Evolutionary Model Discovery for the causal inference of three very different cases of human social behavior are discussed, revisiting the rules underlying two widely studied models in the literature, the Artificial Anasazi and Schelling\u27s Segregation, and an ensemble model of diffusion of information and information overload. First, previously unconsidered factors driving the socio-agricultural behavior of an ancient Pueblo society are discovered, assisting in the construction of a more robust and accurate version of the Artificial Anasazi model. Second, factors that contribute to the coexistence of mixed patterns of segregation and integration are discovered on a recent extension of Schelling\u27s Segregation model. Finally, causal factors important to the prioritization of social media notifications under loss of attention due to information overload are discovered on an ensemble of a model of Extended Working Memory and the Multi-Action Cascade Model of conversation
    corecore