977 research outputs found
Mentat: An object-oriented macro data flow system
Mentat, an object-oriented macro data flow system designed to facilitate parallelism in distributed systems, is presented. The macro data flow model is a model of computation similar to the data flow model with two principal differences: the computational complexity of the actors is much greater than in traditional data flow systems, and there are persistent actors that maintain state information between executions. Mentat is a system that combines the object-oriented programming paradigm and the macro data flow model of computation. Mentat programs use a dynamic structure called a future list to represent the future of computations
Solving the Ghost-Gluon System of Yang-Mills Theory on GPUs
We solve the ghost-gluon system of Yang-Mills theory using Graphics
Processing Units (GPUs). Working in Landau gauge, we use the Dyson-Schwinger
formalism for the mathematical description as this approach is well-suited to
directly benefit from the computing power of the GPUs. With the help of a
Chebyshev expansion for the dressing functions and a subsequent appliance of a
Newton-Raphson method, the non-linear system of coupled integral equations is
linearized. The resulting Newton matrix is generated in parallel using OpenMPI
and CUDA(TM). Our results show, that it is possible to cut down the run time by
two orders of magnitude as compared to a sequential version of the code. This
makes the proposed techniques well-suited for Dyson-Schwinger calculations on
more complicated systems where the Yang-Mills sector of QCD serves as a
starting point. In addition, the computation of Schwinger functions using GPU
devices is studied.Comment: 19 pages, 7 figures, additional figure added, dependence on
block-size is investigated in more detail, version accepted by CP
Packing Sporadic Real-Time Tasks on Identical Multiprocessor Systems
In real-time systems, in addition to the functional correctness recurrent
tasks must fulfill timing constraints to ensure the correct behavior of the
system. Partitioned scheduling is widely used in real-time systems, i.e., the
tasks are statically assigned onto processors while ensuring that all timing
constraints are met. The decision version of the problem, which is to check
whether the deadline constraints of tasks can be satisfied on a given number of
identical processors, has been known -complete in the strong sense.
Several studies on this problem are based on approximations involving resource
augmentation, i.e., speeding up individual processors. This paper studies
another type of resource augmentation by allocating additional processors, a
topic that has not been explored until recently. We provide polynomial-time
algorithms and analysis, in which the approximation factors are dependent upon
the input instances. Specifically, the factors are related to the maximum ratio
of the period to the relative deadline of a task in the given task set. We also
show that these algorithms unfortunately cannot achieve a constant
approximation factor for general cases. Furthermore, we prove that the problem
does not admit any asymptotic polynomial-time approximation scheme (APTAS)
unless when the task set has constrained deadlines, i.e.,
the relative deadline of a task is no more than the period of the task.Comment: Accepted and to appear in ISAAC 2018, Yi-Lan, Taiwa
Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling
Though the GPGPU concept is well-known
in image processing, much more work remains to be done
to fully exploit GPUs as an alternative computation
engine. This paper investigates the computation-to-core
mapping strategies to probe the efficiency and scalability
of the robust facet image modeling algorithm on GPUs.
Our fine-grained computation-to-core mapping scheme
shows a significant performance gain over the standard
pixel-wise mapping scheme. With in-depth performance
comparisons across the two different mapping schemes,
we analyze the impact of the level of parallelism on
the GPU computation and suggest two principles for
optimizing future image processing applications on the
GPU platform
08071 Abstracts Collection -- Scheduling
From 10.02. to 15.02., the Dagstuhl Seminar 08071 ``Scheduling\u27\u27 was held
in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Design of testbed and emulation tools
The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems
Beyond Dataflow
This paper presents some recent advanced dataflow architectures. While the dataflow concept offers the potential of high performance, the performance of an actual dataflow implementation can be restricted by a limited number of functional units, limited memory bandwidth, and the need to associatively match pending operations with available functional units. Since the early 1970s, there have been significant developments in both fundamental research and practical realizations of dataflow models of computation. In particular, there has been active research and development in multithreaded architectures that evolved from the dataflow model. Also some other techniques for combining control-flow and dataflow emerged, such as coarse-grain dataflow, dataflow with complex machine operations, RISC dataflow, and micro dataflow. These developments have also had certain impact on the conception of highperformance superscalar processors in the “post-RISC” era
On the implementation of real-time slotbased task-splitting scheduling algorithms for multiprocessor systems
In this paper we discuss challenges and design principles of an implementation of slot-based tasksplitting
algorithms into the Linux 2.6.34 version. We show that this kernel version is provided with
the required features for implementing such scheduling algorithms. We show that the real behavior of
the scheduling algorithm is very close to the theoretical. We run and discuss experiments on 4-core and
24-core machines
- …