386 research outputs found
Recommended from our members
Exploiting iteration-level parallelism in dataflow programs
The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven model of computation, a functional/declarative programming language, and a special-purpose multiprocessor architecture. In this paper we decouple the language and architecture issues by demonstrating that declarative programming is a suitable vehicle for the programming of conventional distributed-memory multiprocessors.This is achieved by appling several transformations to the compiled declarative program to achieve iteration-level (rather than instruction-level) parallelism. The transformations first group individual instructions into sequential light-weight processes, and then insert primitives to: (1) cause array allocation to be distributed over multiple processors, (2) cause computation to follow the data distribution by inserting an index filtering mechanism into a given loop and spawning a copy of it on all PEs; the filter causes each instance of that loop to operate on a different subrange of the index variable.The underlying model of computation is a dataflow/von Neumann hybrid in that exection within a process is control-driven while the creation, blocking, and activation of processes is data-driven.The performance of this process-oriented dataflow system (PODS) is demonstrated using the hydrodynamics simulation benchmark called SIMPLE, where a 19-fold speedup on a 32-processor architecture has been achieved
Recommended from our members
Executing matrix multiply on a process oriented data flow machine
The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs
Recommended from our members
Automatic data/program partitioning using the single assignment principle
Loosely-coupled MIMD architectures do not suffer from memory contention; hence large numbers of processors may be utilized. The main problem, however, is how to partition data and programs in order to exploit the available parallelism. In this paper we show that efficient schemes for automatic data/program partitioning and synchronization may be employed if single assignment is used. Using simulations of program loops common to scientific computations (the Livermore Loops), we demonstrate that only a small fraction of data accesses are remote and thus the degradation in network performance due to multiprocessing is minimal
Cellular Automata Simulating Experimental Properties of Traffic Flows
A model for 1D traffic flow is developed, which is discrete in space and
time. Like the cellular automaton model by Nagel and Schreckenberg [J. Phys. I
France 2, 2221 (1992)], it is simple, fast, and can describe stop-and-go
traffic. Due to its relation to the optimal velocity model by Bando et al.
[Phys. Rev. E 51, 1035 (1995)], its instability mechanism is of deterministic
nature. The model can be easily calibrated to empirical data and displays the
experimental features of traffic data recently reported by Kerner and Rehborn
[Phys. Rev. E 53, R1297 (1996)].Comment: For related work see
http://www.theo2.physik.uni-stuttgart.de/helbing.html and
http://traffic.comphys.uni-duisburg.de/member/home_schreck.htm
Standardized image interpretation and post-processing in cardiovascular magnetic resonance - 2020 update : Society for Cardiovascular Magnetic Resonance (SCMR): Board of Trustees Task Force on Standardized Post-Processing
With mounting data on its accuracy and prognostic value, cardiovascular magnetic resonance (CMR) is becoming an increasingly important diagnostic tool with growing utility in clinical routine. Given its versatility and wide range of quantitative parameters, however, agreement on specific standards for the interpretation and post-processing of CMR studies is required to ensure consistent quality and reproducibility of CMR reports. This document addresses this need by providing consensus recommendations developed by the Task Force for Post-Processing of the Society for Cardiovascular Magnetic Resonance (SCMR). The aim of the Task Force is to recommend requirements and standards for image interpretation and post-processing enabling qualitative and quantitative evaluation of CMR images. Furthermore, pitfalls of CMR image analysis are discussed where appropriate. It is an update of the original recommendations published 2013
Endovascular Treatment for Acute Isolated Internal Carotid Artery Occlusion : A Propensity Score Matched Multicenter Study.
The benefit of endovascular treatment (EVT) in patients with acute symptomatic isolated occlusion of the internal carotid artery (ICA) without involvement of the middle and anterior cerebral arteries is unclear. We aimed to compare clinical and safety outcomes of best medical treatment (BMT) versus EVT + BMT in patients with stroke due to isolated ICA occlusion.
We conducted a retrospective multicenter study involving patients with isolated ICA occlusion between January 2016 and December 2020. We stratified patients by BMT versus EVT and matched the groups using propensity score matching (PSM). We assessed the effect of treatment strategy on favorable outcome (modified Rankin scale ≤ 2) 90 days after treatment and compared reduction in NIHSS score at discharge, rates of symptomatic intracranial hemorrhage (sICH) and 3‑month mortality.
In total, we included 149 patients with isolated ICA occlusion. To address imbalances, we matched 45 patients from each group using PSM. The rate of favorable outcomes at 90 days was 56% for EVT and 38% for BMT (odds ratio, OR 1.89, 95% confidence interval, CI 0.84-4.24; p = 0.12). Patients treated with EVT showed a median reduction in NIHSS score at discharge of 6 points compared to 1 point for BMT patients (p = 0.02). Rates of symptomatic intracranial hemorrhage (7% vs. 4%; p = 0.66) and 3‑month mortality (11% vs. 13%; p = 0.74) did not differ between treatment groups. Periprocedural complications of EVT with early neurological deterioration occurred in 7% of cases.
Although the benefit on functional outcome did not reach statistical significance, the results for NIHSS score improvement, and safety support the use of EVT in patients with stroke due to isolated ICA occlusion
Non-LTE modeling of supernova-fallback disks
We present a first detailed spectrum synthesis calculation of a
supernova-fallback disk composed of iron. We assume a geometrically thin disk
with a radial structure described by the classical alpha-disk model. The disk
is represented by concentric rings radiating as plane-parallel slabs. The
vertical structure and emission spectrum of each ring is computed in a fully
self-consistent manner by solving the structure equations simultaneously with
the radiation transfer equations under non-LTE conditions. We describe the
properties of a specific disk model and discuss various effects on the emergent
UV/optical spectrum.
We find that strong iron-line blanketing causes broad absorption features
over the whole spectral range. Limb darkening changes the spectral distribution
up to a factor of four depending on the inclination angle. Consequently, such
differences also occur between a blackbody spectrum and our model. The overall
spectral shape is independent of the exact chemical composition as long as iron
is the dominant species. A pure iron composition cannot be distinguished from
silicon-burning ash. Non-LTE effects are small and restricted to few spectral
features.Comment: ApSS, accepted, Proceedings of Isolated Neutron Stars: from the
Interior to the Surface, April 24-28, 2006, London, U
Avalanche dynamics, surface roughening and self-organized criticality - experiments on a 3 dimensional pile of rice
We present a two-dimensional system which exhibits features of self-organized
criticality. The avalanches which occur on the surface of a pile of rice are
found to exhibit finite size scaling in their probability distribution. The
critical exponents are = 1.21(2) for the avalanche size distribution and
= 1.99(2) for the cut-off size. Furthermore the geometry of the avalanches
is studied leading to a fractal dimension of the active sites of =
1.58(2). Using a set of scaling relations, we can calculate the roughness
exponent = 0.41(3) and the dynamic exponent = 1.56(8). This result is compared with that obtained from a power
spectrum analysis of the surface roughness, which yields = 0.42(3) and
= 1.5(1) in excellent agreement with those obtained from the scaling
relations.Comment: 7 pages, 8 figures, accepted for publication in PR
- …