5,811 research outputs found
A Multi-GPU Programming Library for Real-Time Applications
We present MGPU, a C++ programming library targeted at single-node multi-GPU
systems. Such systems combine disproportionate floating point performance with
high data locality and are thus well suited to implement real-time algorithms.
We describe the library design, programming interface and implementation
details in light of this specific problem domain. The core concepts of this
work are a novel kind of container abstraction and MPI-like communication
methods for intra-system communication. We further demonstrate how MGPU is used
as a framework for porting existing GPU libraries to multi-device
architectures. Putting our library to the test, we accelerate an iterative
non-linear image reconstruction algorithm for real-time magnetic resonance
imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs
and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us
to conclude that multi-GPU systems are a viable solution for real-time MRI
reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure
Novel Multidimensional Models of Opinion Dynamics in Social Networks
Unlike many complex networks studied in the literature, social networks
rarely exhibit unanimous behavior, or consensus. This requires a development of
mathematical models that are sufficiently simple to be examined and capture, at
the same time, the complex behavior of real social groups, where opinions and
actions related to them may form clusters of different size. One such model,
proposed by Friedkin and Johnsen, extends the idea of conventional consensus
algorithm (also referred to as the iterative opinion pooling) to take into
account the actors' prejudices, caused by some exogenous factors and leading to
disagreement in the final opinions.
In this paper, we offer a novel multidimensional extension, describing the
evolution of the agents' opinions on several topics. Unlike the existing
models, these topics are interdependent, and hence the opinions being formed on
these topics are also mutually dependent. We rigorous examine stability
properties of the proposed model, in particular, convergence of the agents'
opinions. Although our model assumes synchronous communication among the
agents, we show that the same final opinions may be reached "on average" via
asynchronous gossip-based protocols.Comment: Accepted by IEEE Transaction on Automatic Control (to be published in
May 2017
An study of the effect of process malleability in the energy efficiency on GPU‑based clusters
The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution
Asymmetric Load Balancing on a Heterogeneous Cluster of PCs
In recent years, high performance computing with commodity clusters of personal computers has become an active area of research. Many organizations build them because they need the computational speedup provided by parallel processing but cannot afford to purchase a supercomputer. With commercial supercomputers and homogenous clusters of PCs, applications that can be statically load balanced are done so by assigning equal tasks to each processor. With heterogeneous clusters, the system designers have the option of quickly adding newer hardware that is more powerful than the existing hardware. When this is done, the assignment of equal tasks to each processor results in suboptimal performance. This research addresses techniques by which the size of the tasks assigned to processors is a suitable match to the processors themselves, in which the more powerful processors can do more work, and the less powerful processors perform less work. We find that when the range of processing power is narrow, some benefit can be achieved with asymmetric load balancing. When the range of processing power is broad, dramatic improvements in performance are realized our experiments have shown up to 92% improvement when asymmetrically load balancing a modified version of the NAS Parallel Benchmarks\u27 LU application
- …