11,641 research outputs found
Recommended from our members
Executing matrix multiply on a process oriented data flow machine
The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1
Recommended from our members
Automatic data/program partitioning using the single assignment principle
Loosely-coupled MIMD architectures do not suffer from memory contention; hence large numbers of processors may be utilized. The main problem, however, is how to partition data and programs in order to exploit the available parallelism. In this paper we show that efficient schemes for automatic data/program partitioning and synchronization may be employed if single assignment is used. Using simulations of program loops common to scientific computations (the Livermore Loops), we demonstrate that only a small fraction of data accesses are remote and thus the degradation in network performance due to multiprocessing is minimal
Fundamentals of Traffic Flow
From single vehicle data a number of new empirical results concerning the
density-dependence of the velocity distribution and its moments as well as the
characteristics of their temporal fluctuations have been determined. These are
utilized for the specification of some fundamental relations of traffic flow
and compared with existing traffic theories.Comment: For related work see
http://www.theo2.physik.uni-stuttgart.de/helbing.htm
Viscous to Inertial Crossover in Liquid Drop Coalescence
Using an electrical method and high-speed imaging we probe drop coalescence
down to 10 ns after the drops touch. By varying the liquid viscosity over two
decades, we conclude that at sufficiently low approach velocity where
deformation is not present, the drops coalesce with an unexpectedly late
crossover time between a regime dominated by viscous and one dominated by
inertial effects. We argue that the late crossover, not accounted for in the
theory, can be explained by an appropriate choice of length-scales present in
the flow geometry.Comment: 4 pages, 4 figure
Simulation for human factors research. A central question: Fidelity
Generalized outlines are presented for simulation in human factors research. Recent trends in aeronautical simulation are given. Some criteria for effective training devices are also given. Full system/full mission simulation in aviation and in space human factors research is presented
Open boundary conditions in stochastic transport processes with pair-factorized steady states
Using numerical methods we discuss the effects of open boundary conditions on
condensation phenomena in the zero-range process (ZRP) and transport processes
with pair-factorized steady states (PFSS), an extended model of the ZRP with
nearest-neighbor interaction. For the zero-range process we compare to
analytical results in the literature with respect to criticality and
condensation. For the extended model we find a similar phase structure, but
observe supercritical phases with droplet formation for strong boundary drives.Comment: conference contribution for the 27th Annual CSP Workshop on "Recent
Developments in Computer Simulation Studies in Condensed Matter Physics", CSP
2014 5 pages, 5 figure
Multiple transient memories in sheared suspensions: robustness, structure, and routes to plasticity
Multiple transient memories, originally discovered in charge-density-wave
conductors, are a remarkable and initially counterintuitive example of how a
system can store information about its driving. In this class of memories, a
system can learn multiple driving inputs, nearly all of which are eventually
forgotten despite their continual input. If sufficient noise is present, the
system regains plasticity so that it can continue to learn new memories
indefinitely. Recently, Keim & Nagel showed how multiple transient memories
could be generalized to a generic driven disordered system with noise, giving
as an example simulations of a simple model of a sheared non-Brownian
suspension. Here, we further explore simulation models of suspensions under
cyclic shear, focussing on three main themes: robustness, structure, and
overdriving. We show that multiple transient memories are a robust feature
independent of many details of the model. The steady-state spatial distribution
of the particles is sensitive to the driving algorithm; nonetheless, the memory
formation is independent of such a change in particle correlations. Finally, we
demonstrate that overdriving provides another means for controlling memory
formation and retention
Two-lane traffic rules for cellular automata: A systematic approach
Microscopic modeling of multi-lane traffic is usually done by applying
heuristic lane changing rules, and often with unsatisfying results. Recently, a
cellular automaton model for two-lane traffic was able to overcome some of
these problems and to produce a correct density inversion at densities somewhat
below the maximum flow density. In this paper, we summarize different
approaches to lane changing and their results, and propose a general scheme,
according to which realistic lane changing rules can be developed. We test this
scheme by applying it to several different lane changing rules, which, in spite
of their differences, generate similar and realistic results. We thus conclude
that, for producing realistic results, the logical structure of the lane
changing rules, as proposed here, is at least as important as the microscopic
details of the rules
- …