5,385 research outputs found
Automated problem scheduling and reduction of synchronization delay effects
It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs
A Survey of Pipelined Workflow Scheduling: Models and Algorithms
International audienceA large class of applications need to execute the same workflow on different data sets of identical size. Efficient execution of such applications necessitates intelligent distribution of the application components and tasks on a parallel machine, and the execution can be orchestrated by utilizing task-, data-, pipelined-, and/or replicated-parallelism. The scheduling problem that encompasses all of these techniques is called pipelined workflow scheduling, and it has been widely studied in the last decade. Multiple models and algorithms have flourished to tackle various programming paradigms, constraints, machine behaviors or optimization goals. This paper surveys the field by summing up and structuring known results and approaches
Joint Routing and STDMA-based Scheduling to Minimize Delays in Grid Wireless Sensor Networks
In this report, we study the issue of delay optimization and energy
efficiency in grid wireless sensor networks (WSNs). We focus on STDMA (Spatial
Reuse TDMA)) scheduling, where a predefined cycle is repeated, and where each
node has fixed transmission opportunities during specific slots (defined by
colors). We assume a STDMA algorithm that takes advantage of the regularity of
grid topology to also provide a spatially periodic coloring ("tiling" of the
same color pattern). In this setting, the key challenges are: 1) minimizing the
average routing delay by ordering the slots in the cycle 2) being energy
efficient. Our work follows two directions: first, the baseline performance is
evaluated when nothing specific is done and the colors are randomly ordered in
the STDMA cycle. Then, we propose a solution, ORCHID that deliberately
constructs an efficient STDMA schedule. It proceeds in two steps. In the first
step, ORCHID starts form a colored grid and builds a hierarchical routing based
on these colors. In the second step, ORCHID builds a color ordering, by
considering jointly both routing and scheduling so as to ensure that any node
will reach a sink in a single STDMA cycle. We study the performance of these
solutions by means of simulations and modeling. Results show the excellent
performance of ORCHID in terms of delays and energy compared to a shortest path
routing that uses the delay as a heuristic. We also present the adaptation of
ORCHID to general networks under the SINR interference model
Customer Engagement Plans for Peak Load Reduction in Residential Smart Grids
In this paper, we propose and study the effectiveness of customer engagement
plans that clearly specify the amount of intervention in customer's load
settings by the grid operator for peak load reduction. We suggest two different
types of plans, including Constant Deviation Plans (CDPs) and Proportional
Deviation Plans (PDPs). We define an adjustable reference temperature for both
CDPs and PDPs to limit the output temperature of each thermostat load and to
control the number of devices eligible to participate in Demand Response
Program (DRP). We model thermostat loads as power throttling devices and design
algorithms to evaluate the impact of power throttling states and plan
parameters on peak load reduction. Based on the simulation results, we
recommend PDPs to the customers of a residential community with variable
thermostat set point preferences, while CDPs are suitable for customers with
similar thermostat set point preferences. If thermostat loads have multiple
power throttling states, customer engagement plans with less temperature
deviations from thermostat set points are recommended. Contrary to classical
ON/OFF control, higher temperature deviations are required to achieve similar
amount of peak load reduction. Several other interesting tradeoffs and useful
guidelines for designing mutually beneficial incentives for both the grid
operator and customers can also be identified
A unified modulo scheduling and register allocation technique for clustered processors
This work presents a modulo scheduling framework for clustered ILP processors that integrates the cluster assignment, instruction scheduling and register allocation steps in a single phase. This unified approach is more effective than traditional approaches based on sequentially performing some (or all) of the three steps, since it allows optimizing the global code generation problem instead of searching for optimal solutions to each individual step. Besides, it avoids the iterative nature of traditional approaches, which require repeated applications of the three steps until a valid solution is found. The proposed framework includes a mechanism to insert spill code on-the-fly and heuristics to evaluate the quality of partial schedules considering simultaneously inter-cluster communications, memory pressure and register pressure. Transformations that allow trading pressure on a type of resource for another resource are also included. We show that the proposed technique outperforms previously proposed techniques. For instance, the average speed-up for the SPECfp95 is 36% for a 4-cluster configuration.Peer ReviewedPostprint (published version
A parallel implementation of a multisensor feature-based range-estimation method
There are many proposed vision based methods to perform obstacle detection and avoidance for autonomous or semi-autonomous vehicles. All methods, however, will require very high processing rates to achieve real time performance. A system capable of supporting autonomous helicopter navigation will need to extract obstacle information from imagery at rates varying from ten frames per second to thirty or more frames per second depending on the vehicle speed. Such a system will need to sustain billions of operations per second. To reach such high processing rates using current technology, a parallel implementation of the obstacle detection/ranging method is required. This paper describes an efficient and flexible parallel implementation of a multisensor feature-based range-estimation algorithm, targeted for helicopter flight, realized on both a distributed-memory and shared-memory parallel computer
- …