65,240 research outputs found
Recommended from our members
Chippe : a system for constraint driven behavioral synthesis
This report describes the Chippe system, gives some background previous work and describes several sample design runs of the system. Also presented are the sources of the design tradeoffs used by Chippe, and overview of the internal design model, and experiences using the system
An Expressive Language and Efficient Execution System for Software Agents
Software agents can be used to automate many of the tedious, time-consuming
information processing tasks that humans currently have to complete manually.
However, to do so, agent plans must be capable of representing the myriad of
actions and control flows required to perform those tasks. In addition, since
these tasks can require integrating multiple sources of remote information ?
typically, a slow, I/O-bound process ? it is desirable to make execution as
efficient as possible. To address both of these needs, we present a flexible
software agent plan language and a highly parallel execution system that enable
the efficient execution of expressive agent plans. The plan language allows
complex tasks to be more easily expressed by providing a variety of operators
for flexibly processing the data as well as supporting subplans (for
modularity) and recursion (for indeterminate looping). The executor is based on
a streaming dataflow model of execution to maximize the amount of operator and
data parallelism possible at runtime. We have implemented both the language and
executor in a system called THESEUS. Our results from testing THESEUS show that
streaming dataflow execution can yield significant speedups over both
traditional serial (von Neumann) as well as non-streaming dataflow-style
execution that existing software and robot agent execution systems currently
support. In addition, we show how plans written in the language we present can
represent certain types of subtasks that cannot be accomplished using the
languages supported by network query engines. Finally, we demonstrate that the
increased expressivity of our plan language does not hamper performance;
specifically, we show how data can be integrated from multiple remote sources
just as efficiently using our architecture as is possible with a
state-of-the-art streaming-dataflow network query engine
A methodology for the decomposition of discrete event models for parallel simulation
Parallel simulation has presented the possibility of performing high-speed simulation. However, when attempting to make a link between the requirements of parallel simulation and discrete event simulation used in commercial areas such as manufacturing, a major problem arises. This lies in the decomposition of the simulation into a series of concurrently executing objects. Using the activity cycle diagram simulation technique as an illustrative example, this paper suggests a solution to this decomposition problem. This is discussed within the context of providing a conceptually seamless methodology for translating simulation models into a form which can exploit the benefits of parallel computing
A compiler approach to scalable concurrent program design
The programmer's most powerful tool for controlling complexity in program design is abstraction. We seek to use abstraction in the design of concurrent programs, so as to
separate design decisions concerned with decomposition, communication, synchronization, mapping, granularity, and load balancing. This paper describes programming and compiler techniques intended to facilitate this design strategy. The programming techniques are based on a core programming notation with two important properties: the ability to separate concurrent programming concerns, and extensibility with reusable programmer-defined
abstractions. The compiler techniques are based on a simple transformation system together with a set of compilation transformations and portable run-time support. The
transformation system allows programmer-defined abstractions to be defined as source-to-source transformations that convert abstractions into the core notation. The same
transformation system is used to apply compilation transformations that incrementally transform the core notation toward an abstract concurrent machine. This machine can be implemented on a variety of concurrent architectures using simple run-time support.
The transformation, compilation, and run-time system techniques have been implemented and are incorporated in a public-domain program development toolkit. This
toolkit operates on a wide variety of networked workstations, multicomputers, and shared-memory
multiprocessors. It includes a program transformer, concurrent compiler, syntax checker, debugger, performance analyzer, and execution animator. A variety of substantial
applications have been developed using the toolkit, in areas such as climate modeling and fluid dynamics
Petuum: A New Platform for Distributed Machine Learning on Big Data
What is a systematic way to efficiently apply a wide spectrum of advanced ML
programs to industrial scale problems, using Big Models (up to 100s of billions
of parameters) on Big Data (up to terabytes or petabytes)? Modern
parallelization strategies employ fine-grained operations and scheduling beyond
the classic bulk-synchronous processing paradigm popularized by MapReduce, or
even specialized graph-based execution that relies on graph representations of
ML programs. The variety of approaches tends to pull systems and algorithms
design in different directions, and it remains difficult to find a universal
platform applicable to a wide range of ML programs at scale. We propose a
general-purpose framework that systematically addresses data- and
model-parallel challenges in large-scale ML, by observing that many ML programs
are fundamentally optimization-centric and admit error-tolerant,
iterative-convergent algorithmic solutions. This presents unique opportunities
for an integrative system design, such as bounded-error network synchronization
and dynamic scheduling based on ML program structure. We demonstrate the
efficacy of these system designs versus well-known implementations of modern ML
algorithms, allowing ML programs to run in much less time and at considerably
larger model sizes, even on modestly-sized compute clusters.Comment: 15 pages, 10 figures, final version in KDD 2015 under the same titl
Hierarchical fractional-step approximations and parallel kinetic Monte Carlo algorithms
We present a mathematical framework for constructing and analyzing parallel
algorithms for lattice Kinetic Monte Carlo (KMC) simulations. The resulting
algorithms have the capacity to simulate a wide range of spatio-temporal scales
in spatially distributed, non-equilibrium physiochemical processes with complex
chemistry and transport micro-mechanisms. The algorithms can be tailored to
specific hierarchical parallel architectures such as multi-core processors or
clusters of Graphical Processing Units (GPUs). The proposed parallel algorithms
are controlled-error approximations of kinetic Monte Carlo algorithms,
departing from the predominant paradigm of creating parallel KMC algorithms
with exactly the same master equation as the serial one.
Our methodology relies on a spatial decomposition of the Markov operator
underlying the KMC algorithm into a hierarchy of operators corresponding to the
processors' structure in the parallel architecture. Based on this operator
decomposition, we formulate Fractional Step Approximation schemes by employing
the Trotter Theorem and its random variants; these schemes, (a) determine the
communication schedule} between processors, and (b) are run independently on
each processor through a serial KMC simulation, called a kernel, on each
fractional step time-window.
Furthermore, the proposed mathematical framework allows us to rigorously
justify the numerical and statistical consistency of the proposed algorithms,
showing the convergence of our approximating schemes to the original serial
KMC. The approach also provides a systematic evaluation of different processor
communicating schedules.Comment: 34 pages, 9 figure
- …