13,951 research outputs found
Recommended from our members
Executing matrix multiply on a process oriented data flow machine
The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1
Fundamentals of Traffic Flow
From single vehicle data a number of new empirical results concerning the
density-dependence of the velocity distribution and its moments as well as the
characteristics of their temporal fluctuations have been determined. These are
utilized for the specification of some fundamental relations of traffic flow
and compared with existing traffic theories.Comment: For related work see
http://www.theo2.physik.uni-stuttgart.de/helbing.htm
Recommended from our members
Automatic data/program partitioning using the single assignment principle
Loosely-coupled MIMD architectures do not suffer from memory contention; hence large numbers of processors may be utilized. The main problem, however, is how to partition data and programs in order to exploit the available parallelism. In this paper we show that efficient schemes for automatic data/program partitioning and synchronization may be employed if single assignment is used. Using simulations of program loops common to scientific computations (the Livermore Loops), we demonstrate that only a small fraction of data accesses are remote and thus the degradation in network performance due to multiprocessing is minimal
Memory formation in matter
Memory formation in matter is a theme of broad intellectual relevance; it
sits at the interdisciplinary crossroads of physics, biology, chemistry, and
computer science. Memory connotes the ability to encode, access, and erase
signatures of past history in the state of a system. Once the system has
completely relaxed to thermal equilibrium, it is no longer able to recall
aspects of its evolution. Memory of initial conditions or previous training
protocols will be lost. Thus many forms of memory are intrinsically tied to
far-from-equilibrium behavior and to transient response to a perturbation. This
general behavior arises in diverse contexts in condensed matter physics and
materials: phase change memory, shape memory, echoes, memory effects in
glasses, return-point memory in disordered magnets, as well as related contexts
in computer science. Yet, as opposed to the situation in biology, there is
currently no common categorization and description of the memory behavior that
appears to be prevalent throughout condensed-matter systems. Here we focus on
material memories. We will describe the basic phenomenology of a few of the
known behaviors that can be understood as constituting a memory. We hope that
this will be a guide towards developing the unifying conceptual underpinnings
for a broad understanding of memory effects that appear in materials
Two-lane traffic rules for cellular automata: A systematic approach
Microscopic modeling of multi-lane traffic is usually done by applying
heuristic lane changing rules, and often with unsatisfying results. Recently, a
cellular automaton model for two-lane traffic was able to overcome some of
these problems and to produce a correct density inversion at densities somewhat
below the maximum flow density. In this paper, we summarize different
approaches to lane changing and their results, and propose a general scheme,
according to which realistic lane changing rules can be developed. We test this
scheme by applying it to several different lane changing rules, which, in spite
of their differences, generate similar and realistic results. We thus conclude
that, for producing realistic results, the logical structure of the lane
changing rules, as proposed here, is at least as important as the microscopic
details of the rules
Breakdown and recovery in traffic flow models
Most car-following models show a transition from laminar to ``congested''
flow and vice versa. Deterministic models often have a density range where a
disturbance needs a sufficiently large critical amplitude to move the flow from
the laminar into the congested phase. In stochastic models, it may be assumed
that the size of this amplitude gets translated into a waiting time, i.e.\
until fluctuations sufficiently add up to trigger the transition. A recently
introduced model of traffic flow however does not show this behavior: in the
density regime where the jam solution co-exists with the high-flow state, the
intrinsic stochasticity of the model is not sufficient to cause a transition
into the jammed regime, at least not within relevant time scales. In addition,
models can be differentiated by the stability of the outflow interface. We
demonstrate that this additional criterion is not related to the stability of
the flow. The combination of these criteria makes it possible to characterize
commonalities and differences between many existing models for traffic in a new
way
Viscous to Inertial Crossover in Liquid Drop Coalescence
Using an electrical method and high-speed imaging we probe drop coalescence
down to 10 ns after the drops touch. By varying the liquid viscosity over two
decades, we conclude that at sufficiently low approach velocity where
deformation is not present, the drops coalesce with an unexpectedly late
crossover time between a regime dominated by viscous and one dominated by
inertial effects. We argue that the late crossover, not accounted for in the
theory, can be explained by an appropriate choice of length-scales present in
the flow geometry.Comment: 4 pages, 4 figure
- …
