242,210 research outputs found
Using a functional language and graph reduction to program multiprocessor machines or functional control of imperative programs
Journal ArticleThis paper describes an effective means for programming shared memory multiprocessors whereby a set of sequential activities are linked together for execution in parallel. The glue for this linkage is provided by a functional language implemented via graph reduction and demand evaluation. The full power of functional programming is used to obtain succinct, high level specifications of parallel computations. The imperative procedures that constitute the sequential activities facilitate efficient utilization of individual processing elements, while the mechanisms inherent in graph reduction synchronize and schedule these activities. The main contributions of this paper are: 1) an evaluation of the performance implications of parallel graph reduction; 2) a demonstration that the mechanisms of graph reduction can obtain multiprocessor performance uniformly surpassing the best uni-processor implementation of sequential algorithms running on a single node of the same machine, and 3) an illustration of our method used to program a real world fluid flow simulation problem
Doctor of Philosophy
dissertationDataflow pipeline models are widely used in visualization systems. Despite recent advancements in parallel architecture, most systems still support only a single CPU or a small collection of CPUs such as a SMP workstation. Even for systems that are specifically tuned towards parallel visualization, their execution models only provide support for data-parallelism while ignoring taskparallelism and pipeline-parallelism. With the recent popularization of machines equipped with multicore CPUs and multi-GPU units, these visualization systems are undoubtedly falling further behind in reaching maximum efficiency. On the other hand, there exist several libraries that can schedule program executions on multiple CPUs and/or multiple GPUs. However, due to differences in executing a task graph and a pipeline along with their APIs being considerably low-level, it still remains a challenge to integrate these run-time libraries into current visualization systems. Thus, there is a need for a redesigned dataflow architecture to fully support and exploit the power of highly parallel machines in large-scale visualization. The new design must be able to schedule executions on heterogeneous platforms while at the same time supporting arbitrarily large datasets through the use of streaming data structures. The primary goal of this dissertation work is to develop a parallel dataflow architecture for streaming large-scale visualizations. The framework includes supports for platforms ranging from multicore processors to clusters consisting of thousands CPUs and GPUs. We achieve this in our system by introducing the notion of Virtual Processing Elements and Task-Oriented Modules along with a highly customizable scheduler that controls the assignment of tasks to elements dynamically. This creates an intuitive way to maintain multiple CPU/GPU kernels yet still provide coherency and synchronization across module executions. We have implemented these techniques into HyperFlow which is made of an API with all basic dataflow constructs described in the dissertation, and a distributed run-time library that can be used to deploy those pipelines on multicore, multi-GPU and cluster-based platforms
Scheduling multiple divisible loads on a linear processor network
Min, Veeravalli, and Barlas have recently proposed strategies to minimize the
overall execution time of one or several divisible loads on a heterogeneous
linear network, using one or more installments. We show on a very simple
example that their approach does not always produce a solution and that, when
it does, the solution is often suboptimal. We also show how to find an optimal
schedule for any instance, once the number of installments per load is given.
Then, we formally state that any optimal schedule has an infinite number of
installments under a linear cost model as the one assumed in the original
papers. Therefore, such a cost model cannot be used to design practical
multi-installment strategies. Finally, through extensive simulations we
confirmed that the best solution is always produced by the linear programming
approach, while solutions of the original papers can be far away from the
optimal
Reclaiming the energy of a schedule: models and algorithms
We consider a task graph to be executed on a set of processors. We assume
that the mapping is given, say by an ordered list of tasks to execute on each
processor, and we aim at optimizing the energy consumption while enforcing a
prescribed bound on the execution time. While it is not possible to change the
allocation of a task, it is possible to change its speed. Rather than using a
local approach such as backfilling, we consider the problem as a whole and
study the impact of several speed variation models on its complexity. For
continuous speeds, we give a closed-form formula for trees and series-parallel
graphs, and we cast the problem into a geometric programming problem for
general directed acyclic graphs. We show that the classical dynamic voltage and
frequency scaling (DVFS) model with discrete modes leads to a NP-complete
problem, even if the modes are regularly distributed (an important particular
case in practice, which we analyze as the incremental model). On the contrary,
the VDD-hopping model leads to a polynomial solution. Finally, we provide an
approximation algorithm for the incremental model, which we extend for the
general DVFS model.Comment: A two-page extended abstract of this work appeared as a short
presentation in SPAA'2011, while the long version has been accepted for
publication in "Concurrency and Computation: Practice and Experience
Speed-scaling with no Preemptions
We revisit the non-preemptive speed-scaling problem, in which a set of jobs
have to be executed on a single or a set of parallel speed-scalable
processor(s) between their release dates and deadlines so that the energy
consumption to be minimized. We adopt the speed-scaling mechanism first
introduced in [Yao et al., FOCS 1995] according to which the power dissipated
is a convex function of the processor's speed. Intuitively, the higher is the
speed of a processor, the higher is the energy consumption. For the
single-processor case, we improve the best known approximation algorithm by
providing a -approximation algorithm,
where is a generalization of the Bell number. For the
multiprocessor case, we present an approximation algorithm of ratio
improving the best known result by a factor of
. Notice that our
result holds for the fully heterogeneous environment while the previous known
result holds only in the more restricted case of parallel processors with
identical power functions
Chance-Constrained Outage Scheduling using a Machine Learning Proxy
Outage scheduling aims at defining, over a horizon of several months to
years, when different components needing maintenance should be taken out of
operation. Its objective is to minimize operation-cost expectation while
satisfying reliability-related constraints. We propose a distributed
scenario-based chance-constrained optimization formulation for this problem. To
tackle tractability issues arising in large networks, we use machine learning
to build a proxy for predicting outcomes of power system operation processes in
this context. On the IEEE-RTS79 and IEEE-RTS96 networks, our solution obtains
cheaper and more reliable plans than other candidates
- …