Search CORE

65,240 research outputs found

Recommended from our members

Chippe : a system for constraint driven behavioral synthesis

Author: Brewer Forrest
Gajski Daniel
Publication venue: eScholarship, University of California
Publication date: 02/04/1988
Field of study

This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, and overview of the internal design model, and experiences using the system

eScholarship - University of California

An Expressive Language and Efficient Execution System for Software Agents

Author: Barish G.
Knoblock C. A.
Publication venue: 'AI Access Foundation'
Publication date: 09/09/2011
Field of study

Software agents can be used to automate many of the tedious, time-consuming information processing tasks that humans currently have to complete manually. However, to do so, agent plans must be capable of representing the myriad of actions and control flows required to perform those tasks. In addition, since these tasks can require integrating multiple sources of remote information ? typically, a slow, I/O-bound process ? it is desirable to make execution as efficient as possible. To address both of these needs, we present a flexible software agent plan language and a highly parallel execution system that enable the efficient execution of expressive agent plans. The plan language allows complex tasks to be more easily expressed by providing a variety of operators for flexibly processing the data as well as supporting subplans (for modularity) and recursion (for indeterminate looping). The executor is based on a streaming dataflow model of execution to maximize the amount of operator and data parallelism possible at runtime. We have implemented both the language and executor in a system called THESEUS. Our results from testing THESEUS show that streaming dataflow execution can yield significant speedups over both traditional serial (von Neumann) as well as non-streaming dataflow-style execution that existing software and robot agent execution systems currently support. In addition, we show how plans written in the language we present can represent certain types of subtasks that cannot be accomplished using the languages supported by network query engines. Finally, we demonstrate that the increased expressivity of our plan language does not hamper performance; specifically, we show how data can be integrated from multiple remote sources just as efficiently using our architecture as is possible with a state-of-the-art streaming-dataflow network query engine

arXiv.org e-Print Archive

Crossref

A methodology for the decomposition of discrete event models for parallel simulation

Author: Taylor S J E
Wilson D R
Winter S C
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1993
Field of study

Parallel simulation has presented the possibility of performing high-speed simulation. However, when attempting to make a link between the requirements of parallel simulation and discrete event simulation used in commercial areas such as manufacturing, a major problem arises. This lies in the decomposition of the simulation into a series of concurrently executing objects. Using the activity cycle diagram simulation technique as an illustrative example, this paper suggests a solution to this decomposition problem. This is discussed within the context of providing a conceptually seamless methodology for translating simulation models into a form which can exploit the benefits of parallel computing

CiteSeerX

Brunel University Research Archive

A compiler approach to scalable concurrent program design

Author: Foster Ian
Taylor Stephen
Publication venue: 'California Institute of Technology Library'
Publication date: 01/01/1992
Field of study

The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support. The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics

CiteSeerX

Caltech Authors

Petuum: A New Platform for Distributed Machine Learning on Big Data

Author: Dai Wei
Ho Qirong
Kim Jin Kyu
Kumar Abhimanu
Lee Seunghak
Wei Jinliang
Xie Pengtao
Xing Eric P.
Yu Yaoliang
Zheng Xun
Publication venue
Publication date: 01/01/2015
Field of study

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.Comment: 15 pages, 10 figures, final version in KDD 2015 under the same titl

arXiv.org e-Print Archive

CiteSeerX

Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms

Author: Arampatzis Giorgos
Katsoulakis Markos A.
Plechac Petr
Taufer Michela
Xu Lifan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

We present a mathematical framework for constructing and analyzing parallel algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting algorithms have the capacity to simulate a wide range of spatio-temporal scales in spatially distributed, non-equilibrium physiochemical processes with complex chemistry and transport micro-mechanisms. The algorithms can be tailored to specific hierarchical parallel architectures such as multi-core processors or clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms are controlled-error approximations of kinetic Monte Carlo algorithms, departing from the predominant paradigm of creating parallel KMC algorithms with exactly the same master equation as the serial one. Our methodology relies on a spatial decomposition of the Markov operator underlying the KMC algorithm into a hierarchy of operators corresponding to the processors' structure in the parallel architecture. Based on this operator decomposition, we formulate Fractional Step Approximation schemes by employing the Trotter Theorem and its random variants; these schemes, (a) determine the communication schedule} between processors, and (b) are run independently on each processor through a serial KMC simulation, called a kernel, on each fractional step time-window. Furthermore, the proposed mathematical framework allows us to rigorously justify the numerical and statistical consistency of the proposed algorithms, showing the convergence of our approximating schemes to the original serial KMC. The approach also provides a systematic evaluation of different processor communicating schedules.Comment: 34 pages, 9 figure

arXiv.org e-Print Archive

ScholarWorks@UMass Amherst

ACMAC