5,811 research outputs found

    A Multi-GPU Programming Library for Real-Time Applications

    Full text link
    We present MGPU, a C++ programming library targeted at single-node multi-GPU systems. Such systems combine disproportionate floating point performance with high data locality and are thus well suited to implement real-time algorithms. We describe the library design, programming interface and implementation details in light of this specific problem domain. The core concepts of this work are a novel kind of container abstraction and MPI-like communication methods for intra-system communication. We further demonstrate how MGPU is used as a framework for porting existing GPU libraries to multi-device architectures. Putting our library to the test, we accelerate an iterative non-linear image reconstruction algorithm for real-time magnetic resonance imaging using multiple GPUs. We achieve a speed-up of about 1.7 using 2 GPUs and reach a final speed-up of 2.1 with 4 GPUs. These promising results lead us to conclude that multi-GPU systems are a viable solution for real-time MRI reconstruction as well as signal-processing applications in general.Comment: 15 pages, 10 figure

    Novel Multidimensional Models of Opinion Dynamics in Social Networks

    Full text link
    Unlike many complex networks studied in the literature, social networks rarely exhibit unanimous behavior, or consensus. This requires a development of mathematical models that are sufficiently simple to be examined and capture, at the same time, the complex behavior of real social groups, where opinions and actions related to them may form clusters of different size. One such model, proposed by Friedkin and Johnsen, extends the idea of conventional consensus algorithm (also referred to as the iterative opinion pooling) to take into account the actors' prejudices, caused by some exogenous factors and leading to disagreement in the final opinions. In this paper, we offer a novel multidimensional extension, describing the evolution of the agents' opinions on several topics. Unlike the existing models, these topics are interdependent, and hence the opinions being formed on these topics are also mutually dependent. We rigorous examine stability properties of the proposed model, in particular, convergence of the agents' opinions. Although our model assumes synchronous communication among the agents, we show that the same final opinions may be reached "on average" via asynchronous gossip-based protocols.Comment: Accepted by IEEE Transaction on Automatic Control (to be published in May 2017

    An study of the effect of process malleability in the energy efficiency on GPU‑based clusters

    Get PDF
    The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution

    Asymmetric Load Balancing on a Heterogeneous Cluster of PCs

    Get PDF
    In recent years, high performance computing with commodity clusters of personal computers has become an active area of research. Many organizations build them because they need the computational speedup provided by parallel processing but cannot afford to purchase a supercomputer. With commercial supercomputers and homogenous clusters of PCs, applications that can be statically load balanced are done so by assigning equal tasks to each processor. With heterogeneous clusters, the system designers have the option of quickly adding newer hardware that is more powerful than the existing hardware. When this is done, the assignment of equal tasks to each processor results in suboptimal performance. This research addresses techniques by which the size of the tasks assigned to processors is a suitable match to the processors themselves, in which the more powerful processors can do more work, and the less powerful processors perform less work. We find that when the range of processing power is narrow, some benefit can be achieved with asymmetric load balancing. When the range of processing power is broad, dramatic improvements in performance are realized our experiments have shown up to 92% improvement when asymmetrically load balancing a modified version of the NAS Parallel Benchmarks\u27 LU application
    • …
    corecore