Search CORE

7 research outputs found

A scalable system for factored learning in the cloud

Author: Derby Owen C
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 79-81).This work presents FlexGP, a new system designed for scalable machine learning in the cloud. FlexGP presents a learner-agnostic, data-parallel approach to cloud-based distributed learning using existing single-machine algorithms, without any dependence on distributed file systems or shared memory between instances. We design and implement asynchronous and decentralized launch and peer discovery protocols to start and configure a distributed network of learners. Through a unique process of factoring the data and parameters across the learners, FlexGP ensures this network consists of heterogeneous learners producing diverse models. These models are then filtered and fused to produce a meta-model for prediction. Using a thoughtfully designed test framework, FlexGP is run on a real-world regression problem from a large database. The results demonstrate the reliability and robustness of the system, even when learning from very little training data and multiple factorings, and demonstrate FlexGP as a vital tool to effectively leverage the cloud for machine learning tasks.by Owen C. Derby.M. Eng

DSpace@MIT

Multiple levels of parallelism in distributed machine learning via genetic programming

Author: Sherry Dylan J. (Dylan Jacob)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 105-107).This thesis presents FlexGP 2.0, a distributed cloud-backed machine learning system. FlexGP 2.0 features multiple levels of parallelism which provide a significant improvement in accuracy v.s. elapsed time. The amount of computational resources in FlexGP 2.0 can be scaled along several dimensions to support large, complex data. FlexGP 2.0's core genetic programming (GP) learner includes multithreaded C++ model evaluation and a multi-objective optimization algorithm which is extensible to pursue any number of objectives simultaneously in parallel. FlexGP 2.0 parallelizes the entire learner to obtain a large distributed population size and leverages communication between learners to increase performance via transferral of search progress between learners. FlexGP 2.0 factors training data to boost performance and enable support for increased data size and complexity. Several experiments are performed which verify the efficacy of FlexGP 2.0's multilevel parallelism. Experiments run on a large dataset from a real-world regression problem. The results demonstrate both less time to achieve the same accuracy and overall increased accuracy, and illustrate the value of FlexGP 2.0 as a platform for machine learning.by Dylan J. Sherry.M. Eng

DSpace@MIT

The Seamless Peer and Cloud Evolution Framework

Author: Chong F. S.
Desell T.
Duda J.
Nolfi S.
Rivas V. M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/08/2016
Field of study

Evolutionary algorithms are increasingly being applied to problems that are too computationally expensive to run on a single personal computer due to costly fitness function evaluations and/or large numbers of fitness evaluations. Here, we introduce the Seamless Peer And Cloud Evolution (SPACE) framework, which leverages bleeding edge web technologies to allow the computational resources necessary for running large scale evolutionary experiments to be made available to amateur and professional researchers alike, in a scalable and cost-effective manner, directly from their web browsers. The SPACE framework accomplishes this by distributing fitness evaluations across a heterogeneous pool of cloud compute nodes and peer computers. As a proof of concept, this framework has been attached to the RoboGen open-source platform for the co-evolution of robot bodies and brains, but importantly the framework has been built in a modular fashion such that it can be easily coupled with other evolutionary computation systems

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Machine Learning Technologies and Their Applications for Science and Engineering Domains Workshop -- Summary Report

Author: Ambur Manjula
Mavris Dimitri N.
Schwartz Katherine G.
Publication venue
Publication date
Field of study

The fields of machine learning and big data analytics have made significant advances in recent years, which has created an environment where cross-fertilization of methods and collaborations can achieve previously unattainable outcomes. The Comprehensive Digital Transformation (CDT) Machine Learning and Big Data Analytics team planned a workshop at NASA Langley in August 2016 to unite leading experts the field of machine learning and NASA scientists and engineers. The primary goal for this workshop was to assess the state-of-the-art in this field, introduce these leading experts to the aerospace and science subject matter experts, and develop opportunities for collaboration. The workshop was held over a three day-period with lectures from 15 leading experts followed by significant interactive discussions. This report provides an overview of the 15 invited lectures and a summary of the key discussion topics that arose during both formal and informal discussion sections. Four key workshop themes were identified after the closure of the workshop and are also highlighted in the report. Furthermore, several workshop attendees provided their feedback on how they are already utilizing machine learning algorithms to advance their research, new methods they learned about during the workshop, and collaboration opportunities they identified during the workshop

NASA Technical Reports Server

Parallel genetic algorithms in the cloud

Author: Salza Pasquale
Publication venue: Universita degli studi di Salerno
Publication date: 21/04/2017
Field of study

2015 - 2016Genetic Algorithms (GAs) are a metaheuristic search technique belonging to the class of Evolutionary Algorithms (EAs). They have been proven to be effective in addressing several problems in many ﬁelds but also suffer from scalability issues that may not let them ﬁnd a valid application for real world problems. Thus, the aim of providing highly scalable GA-based solutions, together with the reduced costs of parallel architectures, motivate the research on Parallel Genetic Algorithms (PGAs). Cloud computing may be a valid option for parallelisation, since there is no need of owning the physical hardware, which can be purchased from cloud providers, for the desired time, quantity and quality. There are different employable cloud technologies and approaches for this purpose, but they all introduce communication overhead. Thus, one might wonder if, and possibly when, speciﬁc approaches, environments and models show better performance than sequential versions in terms of execution time and resource usage. This thesis investigates if and when GAs can scale in the cloud using speciﬁc approaches. Firstly, Hadoop MapReduce is exploited designing and developinganopensourceframework,i.e.,elephant56, thatreducestheeffortin developing and speed up GAs using three parallel models. The performance of theframeworkisthenevaluatedthroughanempiricalstudy. Secondly, software containers and message queues are employed to develop, deploy and execute PGAs in the cloud and the devised system is evaluated with an empirical study on a commercial cloud provider. Finally, cloud technologies are also exploredfortheparallelisationofotherEAs,designinganddevelopingcCube,a collaborativemicroservicesarchitectureformachinelearningproblems. [edited by author]I Genetic Algorithms (GAs) sono una metaeuristica di ricerca appartenenti alla classe degli Evolutionary Algorithms (EAs). Si sono dimostrati efﬁcaci nel risolvere tanti problemi in svariati campi. Tuttavia, le difﬁcoltà nello scalare spesso evitano che i GAs possano trovare una collocazione efﬁcace per la risoluzione di problemi del mondo reale. Quindi, l’obiettivo di fornire soluzioni basate altamente scalabili, assieme alla riduzione dei costi di architetture parallele, motivano la ricerca sui Parallel Genetic Algorithms (PGAs). Il cloud computing potrebbe essere una valida opzione per la parallelizzazione, dato che non c’è necessità di possedere hardware ﬁsico che può, invece, essere acquistato dai cloud provider, per il tempo desiderato, quantità e qualità. Esistono differenti tecnologie e approcci cloud impiegabili a tal proposito ma, tutti, introducono overhead di computazione. Quindi, ci si può chiedere se, e possibilmente quando, approcci speciﬁci, ambienti e modelli mostrino migliori performance rispetto alle versioni sequenziali, in termini di tempo di esecuzione e uso di risorse. Questa tesi indaga se, e quando, i GAs possono scalare nel cloud utilizzando approcci speciﬁci. Prima di tutto, Hadoop MapReduce è sfruttato per modellare e sviluppare un framework open source, i.e., elephant56, che riduce l’effort nello sviluppo e velocizza i GAs usando tre diversi modelli paralleli. Le performance del framework sono poi valutate attraverso uno studio empirico. Successivamente, i software container e le message queue sono impiegati per sviluppare, distribuire e eseguire PGAs e il sistema ideato valutato, attraverso uno studio empirico, su un cloud provider commerciale. Inﬁne, le tecnologie cloud sono esplorate per la parallelizzazione di altri EAs, ideando e sviluppando cCube, un’architettura a microservizi collaborativa per risolvere problemi di machine learning. [a cura dell'autore]XV n.s

EleA@UniSA - Università degli Studi di Salerno

Evolutionary Model Discovery: Automating Causal Inference for Generative Models of Human Social Behavior

Author: Gunaratne Chathika
Publication venue: University of Central Florida
Publication date: 01/01/2019
Field of study

The desire to understand the causes of complex societal phenomena is fundamental to the social sciences. Society, at a macro-scale has many measurable characteristics in the form of statistical distributions and aggregate measures; data which is increasingly abundant with the proliferation of online social media, mobile devices, and the internet of things. However, the decision-making processes and limits of the individuals who interact to generate these statistical patterns are often difficult to unravel. Furthermore, multiple causal factors often interact to determine the outcome of a particular behavior. Quantifying the importance of these causal factors and their interactions, which make up a particular decision-making process, towards a societal outcome of interest helps extract explanations that provide a deeper understanding of social behavior. Holistic, generative modeling techniques, in particular agent-based modeling, are able to \u27grow\u27 artificial societies that replicate emergent patterns seen in the real world. Driving the autonomous agents of these models are rules, generalized hypotheses of human behavior, which upon validation against real-world data, help assemble theories of human behavior. Yet often, multiple hypothetical causal factors can be suggested for the construction of these rules. With traditional agent-based modeling, it is often up to the modeler\u27s discretion to decide which combination of factors best represent the rule at hand. Yet, due to the aforementioned lack of insight, the modeled agent rule is often one out of a vast space of possible rules. In this dissertation, I introduce Evolutionary Model Discovery, a novel framework for automated causal inference, which treats such artificial societies as sandboxes for rule discovery and causal factor importance evaluation. Evolutionary Model Discovery consists of two major phases. Firstly, a rule of interest of a given agent-based model is genetically programmed with combinations of hypothesized factors, attempting to find rules which enable the agent-based model to more closely mimic real-world phenomena. Secondly, the data produced through genetic programming, regarding the correspondence of factor presence in the rule to fitness, is used to train a random forest regressor for importance evaluation. Besides its scientific contributions, this work has also led to the contribution of two Python open-source software libraries for high performance computing with NetLogo, Evolutionary Model Discovery and NL4Py. The results of applying Evolutionary Model Discovery for the causal inference of three very different cases of human social behavior are discussed, revisiting the rules underlying two widely studied models in the literature, the Artificial Anasazi and Schelling\u27s Segregation, and an ensemble model of diffusion of information and information overload. First, previously unconsidered factors driving the socio-agricultural behavior of an ancient Pueblo society are discovered, assisting in the construction of a more robust and accurate version of the Artificial Anasazi model. Second, factors that contribute to the coexistence of mixed patterns of segregation and integration are discovered on a recent extension of Schelling\u27s Segregation model. Finally, causal factors important to the prioritization of social media notifications under loss of attention due to information overload are discovered on an ensemble of a model of Extended Working Memory and the Multi-Action Cascade Model of conversation

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)