2,892 research outputs found
Recommended from our members
Exploiting iteration-level parallelism in dataflow programs
The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven model of computation, a functional/declarative programming language, and a special-purpose multiprocessor architecture. In this paper we decouple the language and architecture issues by demonstrating that declarative programming is a suitable vehicle for the programming of conventional distributed-memory multiprocessors.This is achieved by appling several transformations to the compiled declarative program to achieve iteration-level (rather than instruction-level) parallelism. The transformations first group individual instructions into sequential light-weight processes, and then insert primitives to: (1) cause array allocation to be distributed over multiple processors, (2) cause computation to follow the data distribution by inserting an index filtering mechanism into a given loop and spawning a copy of it on all PEs; the filter causes each instance of that loop to operate on a different subrange of the index variable.The underlying model of computation is a dataflow/von Neumann hybrid in that exection within a process is control-driven while the creation, blocking, and activation of processes is data-driven.The performance of this process-oriented dataflow system (PODS) is demonstrated using the hydrodynamics simulation benchmark called SIMPLE, where a 19-fold speedup on a 32-processor architecture has been achieved
Recommended from our members
Executing matrix multiply on a process oriented data flow machine
The Process-Oriented Dataflow System (PODS) is an execution model that combines the von Neumann and dataflow models of computation to gain the benefits of each. Central to PODS is the concept of array distribution and its effects on partitioning and mapping of processes.In PODS arrays are partitioned by simply assigning consecutive elements to each processing element (PE) equally. Since PODS uses single assignment, there will be only one producer of each element. This producing PE owns that element and will perform the necessary computations to assign it. Using this approach the filling loop is distributed across the PEs. This simple partitioning and mapping scheme provides excellent results for executing scientific code on MIMD machines. In this way PODS allows MIMD machines to exploit vector and data parallelism easily while still providing the flexibility of MIMD over SIMD for multi-user systems.In this paper, the classic matrix multiply algorithm, with 1024 data points, is executed on a PODS simulator and the results are presented and discussed. Matrix multiply is a good example because it has several interesting properties: there are multiple code-blocks; a new array must be dynamically allocated and distributed; there is a loop-carried dependency in the innermost loop; the two input arrays have different access patterns; and the sizes of the input arrays are not known at compile time. Matrix multiply also forms the basis for many important scientific algorithms such as: LU decomposition, convolution, and the Fast-Fourier Transform.The results show that PODS is comparable to both Iannucci's Hybrid Architecture and MIT's TTDA in terms of overhead and instruction power. They also show that PODS easily distributes the work load evenly across the PEs. The key result is that PODS can scale matrix multiply in a near linear fashion until there is little or no work to be performed for each PE. Then overhead and message passing become a major component of the execution time. With larger problems (e.g., >/=16k data points) this limit would be reached at around 256 PEs
Recommended from our members
Automatic data/program partitioning using the single assignment principle
Loosely-coupled MIMD architectures do not suffer from memory contention; hence large numbers of processors may be utilized. The main problem, however, is how to partition data and programs in order to exploit the available parallelism. In this paper we show that efficient schemes for automatic data/program partitioning and synchronization may be employed if single assignment is used. Using simulations of program loops common to scientific computations (the Livermore Loops), we demonstrate that only a small fraction of data accesses are remote and thus the degradation in network performance due to multiprocessing is minimal
Recommended from our members
Exploiting iteration-level parallelism in declarative programs
In order to achieve viable parallel processing three basic criteria must be met: (1) the system must provide a programming environment which hides the details of parallel processing from the programmer; (2) the system must execute efficiently on the given hardware; and (3) the system must be economically attractive.The first criterion can be met by providing the programmer with an implicit rather than explicit programming paradigm. In this way ali of the synchronization and distribution are handled automatically. To meet the second criterion, the system must perform synchronization and distribution in such a way that the available computing resources are used to their utmost. And to meet the third criterion, the system must not require esoteric or expensive hardware to achieve efficient utilization.This dissertation reports on the Process-Oriented Dataflow System (PODS), which meets all of the above criteria. PODS uses a hybrid von Neumann-Dataflow model of computation supported by an automatic partitioning and distribution scheme. The new partitioning and distribution algorithm is presented along with the underlying principles. Four new mechanisms for distribution are presented: (1) a distributed array allocation operator for data distribution; (2) a distributed L operator for code distribution; (3) a range filter for restriction index ranges for different PEs; and (4) a specialized apply operator for functional parallelism.Simulations show that PODS balances communication overhead with distributed processing to achieve efficient parallel execution on distributed memory multiprocessors. This is partially due to a new software array caching scheme, called remote caching, which greatly reduces the amount of remote memory reads. PODS is designed to use off-the-shelf components, with no specialized hardware. In this way a real PODS machine can be built quickly and cost effectively. The system is currently being retargeted to the Intel iPSC/2 so that it can be run on commercially available equipment
Recommended from our members
μ-SLS of Metals: Design of the Powder Spreader, Powder Bed Actuators and Optics for the System
Nanopowders have a tendency to form agglomerates due to high surface energy and the
presence of attractive van der Waals forces. To overcome this problem, we present a powder
spreading mechanism design that can alleviate this phenomenon by using vibration compaction
to produce a uniform powder distribution in the bed. Most SLS machines employ either a roller
or a blade to spread the powder over the powder bed. However, in order to achieve layer
thicknesses of few microns, a new design for the spreading mechanism which includes a
combination of a precision blade and a precision roller is employed. Also, the design of a linear
actuating system for displacing the powder bed with resolution of few tens of nanometers is
presented for the μ-SLS system. Finally, the paper presents a novel optical system that can
drastically increase the throughput of the system .The detailed design of these systems are
presented in this paper.Mechanical Engineerin
The Lag Structure of the Impact of Business Ownership on Economic Performance in OECD Countries
This paper investigates the impact of changes in the number of business owners on three measures of economic performance, viz. employment growth, GDP growth and labor productivity growth. Particular attention is devoted to the lag structure. The analysis is performed at the country level for 21 OECD countries. Our results confirm earlier evidence on three stages in the impact of entry on economic performance: an initial direct positive effect, followed by a negative effect due to exiting capacities and finally a stage of positive supply-side effects. The net effect is positive for employment and GDP growth. Changes in the number of business owners have no effect on labor productivity
Allocation and Productivity of Time in New Ventures of Female and Male Entrepreneurs
[Please note that there exists an updated version of this publication at http://hdl.handle.net/1765/8989] This study investigates the factors explaining the number of hours invested in new ventures, making a distinction between the effect of preference for work time versus leisure time and that of productivity of work time. Using data of 1247 Dutch entrepreneurs, we find that time invested in the business is determined by various aspects of human, financial and social capital, availability of other income, outsourcing, side activities and gender. We show that some of the identified factors relate to preferences and others to productivity. Women appear to invest less time in the business as a result of a range of indirect productivity effects
Does Entrepreneurship Reduce Unemployment?
The relationship between unemployment and entrepreneurship has been shrouded with ambiguity. There is assumed to be a two-way causation between changes in the level of entrepreneurship and that of unemployment-- a "Schumpeter" effect of entrepreneurship reducing unemployment and a "refugee" or "shopkeeper" effect of unemployment stimulating entrepreneurship. The purpose of this paper is to try to reconcile the ambiguities found in the relationship between unemployment and entrepreneurship. We do this by introducing a two equation model where changes in unemployment and in the number of business owners are linked to subsequent changes in those variables for a panel of 23 OECD countries over the period 1974-1998. The existence of two distinct and separate relationships between unemployment and entrepreneurship is identified including significant "Schumpeter" and "refugee" effects
Allocation and Productivity of Time in New Ventures of Female and Male Entrepreneurs
This paper investigates time allocation decisions in new ventures of female and male entrepreneurs using a model that distinguishes between effects of preferences and productivity on the number of working hours. Using data of 1,158 entrepreneurs we find that the preference for work time in new ventures relates to start-up motivation, propensity to take risk and availability of other income. Productivity of work time relates to human, financial and social capital endowments and the prevalence of outsourcing activities. This study also evaluates actual profit effects one year after start-up. We find that on average women invest less time in the business than men. This can be attributed to both a lower preference for work time (driven by risk aversion and availability of other income) and a lower productivity per hour worked (due to lower endowments of human, social and financial capital)
- …