15 research outputs found

    Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0

    Full text link
    © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] A clear trend has emerged involving the acceleration of scientific applications by using GPUs. However, the capabilities of these devices are still generally underutilized. Remote GPU virtualization techniques can help increase GPU utilization rates, while reducing acquisition and maintenance costs. The overhead of using a remote GPU instead of a local one is introduced mainly by the difference in performance between the internode network and the intranode PCIe link. In this paper we show how using the new InfiniBand Connect-IB network adapters (attaining similar throughput to that of the most recently emerged GPUs) boosts the performance of remote GPU virtualization, reducing the overhead to a mere 0.19% in the application tested.This work was funded by the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. This material is based upon work supported by the U. S. Department of Energy, Office of Science, Advanced Scientific Computing Research (SC-21), under Contract No. DE-AC02-06CH11357. Authors from the Universitat Politècnica de València and Universitat Jaume I are grateful for the generous support provided by Mellanox Technologies.Reaño González, C.; Silla Jiménez, F.; Peña Monferrer, AJ.; Shainer, G.; Schultz, S.; Castelló Gimeno, A.; Quintana Orti, ES.... (2014). Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0. En 2014 IEEE International Conference on Cluster Computing (CLUSTER). IEEE. 266-267. doi:10.1109/CLUSTER.2014.6968737S26626

    Reducing the Costs of Teaching CUDA in Laboratories while Maintaining the Learning Experience Quality

    Full text link
    Graphics Processing Units (GPUs) have become widely used to accelerate scientific applications; therefore, it is important that Computer Science and Computer Engineering curricula include the fundamentals of parallel computing with GPUs. Regarding the practical part of the training, one important concern is how to introduce GPUs into a laboratory: installing GPUs in all the computers of the lab may not be affordable, while sharing a remote GPU server among several students may result in a poor learning experience because of its associated overhead. In this paper we propose a solution to address this problem: the use of the rCUDA (remote CUDA) middleware, which enables programs being executed in a computer to make concurrent use of GPUs located in remote servers. Hence, students would be able to concurrently and transparently share a single remote GPU from their local machines in the laboratory without having to log into the remote server. In order to demonstrate that our proposal is feasible, we present results of a real scenario. The results show that the cost of the laboratory is noticeably reduced while the learning experience quality is maintained.Reaño González, C.; Silla Jiménez, F. (2015). Reducing the Costs of Teaching CUDA in Laboratories while Maintaining the Learning Experience Quality. En INTED2015 Proceedings. IATED. 3651-3660. http://hdl.handle.net/10251/70229S3651366

    Simulation of reaction-diffusion processes in three dimensions using CUDA

    Get PDF
    Numerical solution of reaction-diffusion equations in three dimensions is one of the most challenging applied mathematical problems. Since these simulations are very time consuming, any ideas and strategies aiming at the reduction of CPU time are important topics of research. A general and robust idea is the parallelization of source codes/programs. Recently, the technological development of graphics hardware created a possibility to use desktop video cards to solve numerically intensive problems. We present a powerful parallel computing framework to solve reaction-diffusion equations numerically using the Graphics Processing Units (GPUs) with CUDA. Four different reaction-diffusion problems, (i) diffusion of chemically inert compound, (ii) Turing pattern formation, (iii) phase separation in the wake of a moving diffusion front and (iv) air pollution dispersion were solved, and additionally both the Shared method and the Moving Tiles method were tested. Our results show that parallel implementation achieves typical acceleration values in the order of 5-40 times compared to CPU using a single-threaded implementation on a 2.8 GHz desktop computer.Comment: 8 figures, 5 table

    On the Deployment and Characterization of CUDA Teaching Laboratories

    Full text link
    When teaching CUDA in laboratories, an important issue is the economic cost of GPUs, which may prevent some universities from building large enough labs to teach CUDA. In this paper we propose an efficient solution to build CUDA labs reducing the number of GPUs. It is based on the use of the rCUDA (remote CUDA) middleware, which enables programs being executed in a computer to concurrently use GPUs located in remote servers. To study the viability of our proposal, we first characterize the use of GPUs in this kind of labs with statistics taken from real users, and then present results of sharing GPUs in a real teaching lab. The experiments validate the feasibility of our proposal, showing an overhead under 5% with respect to having a GPU at each of the students’ computers. These results clearly improve alternative approaches, such as logging into remote GPU servers, which presents an overhead about 30%.This work was partially funded by Escola Tècnica Superior d’Enginyeria Informàtica de la Universitat Politècnica de Valènciaand by Departament d'Informàtica de Sistemes i Computadors de la Universitat Politècnica de València.Reaño González, C.; Silla Jiménez, F. (2015). On the Deployment and Characterization of CUDA Teaching Laboratories. En EDULEARN15 Proceedings. IATED. http://hdl.handle.net/10251/70225

    Reduciendo el coste económico de las prácticas de CUDA manteniendo la calidad del aprendizaje

    Get PDF
    Resumen: La computación de propósito general con tarjetas gráficas se basa en el uso de estas tarjetas (GPUs) para realizar cálculos computacionales que tradicionalmente son realizados por los procesadores (CPUs). Debido al creciente uso de las GPUs, es importante que los planes de estudio de informática incluyan los fundamentos de la computación paralela con GPUs, al tiempo que se equipan los laboratorios docentes con GPUs a un coste razonable. En este sentido, instalar GPUs en todos los ordenadores del laboratorio puede resultar costoso a nivel económico, mientras que compartir un servidor remoto con GPU entre los estudiantes puede derivar en unas malas condiciones de aprendizaje. En este trabajo proponemos una solución eficaz a este problema: el uso de la tecnología rCUDA (CUDA remoto), que permite a las aplicaciones de un ordenador utilizar, de forma concurrente y transparente, GPUs instaladas en servidores remotos. De esta manera los estudiantes pueden, desde sus puestos de trabajo, compartir una misma GPU instalada en un servidor remoto sin tener que iniciar sesión en el mismo. Para demostrar que nuestra propuesta es factible, presentamos experimentos en un escenario real que muestran cómo el coste del laboratorio es notablemente reducido, mientras que la calidad del aprendizaje se mantiene.Abstract: General-Purpose computing on Graphics Processing Units consists in using Graphics Processing Units (GPUs) to perform the computation of applications traditionally handled by regular processors (CPUs). Due to their increasing use, it is important that Computer Engineering and Computer Science curricula include the basics of this new computing trend. As regards the practical part of the training, one major issue is how to introduce GPUs into a laboratory: buying GPUs for all the workstations of the lab may be too expensive, whereas installing one GPU in a server and requesting the students to log into this server may lead to a low teaching quality due to its associated overhead. In this paper we suggest a new solution to introduce GPUs into a laboratory: the rCUDA (remote CUDA) framework, which allows applications running in a computer to use GPUs installed in remote servers. Hence, students will be capable of sharing a remote GPU (concurrently and transparently) from their local workstations in the lab, without logging into the server. To prove that our approach is possible, we show experiments in a real laboratory. The experiments demonstrate that our proposal reduces the cost of the laboratory, whereas the teaching quality still remains

    Improving the management efficiency of GPU workloads in data centers through GPU virtualization

    Full text link
    [EN] Graphics processing units (GPUs) are currently used in data centers to reduce the execution time of compute-intensive applications. However, the use of GPUs presents several side effects, such as increased acquisition costs and larger space requirements. Furthermore, GPUs require a nonnegligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. In a similar way to the use of virtual machines, using virtual GPUs may address the concerns associated with the use of these devices. In this regard, the remote GPU virtualization mechanism could be leveraged to share the GPUs present in the computing facility among the nodes of the cluster. This would increase overall GPU utilization, thus reducing the negative impact of the increased costs mentioned before. Reducing the amount of GPUs installed in the cluster could also be possible. However, in the same way as job schedulers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current job schedulers are not able to deal with virtual GPUs. In this paper, we analyze the performance attained by a cluster using the remote Compute Unified Device Architecture middleware and a modified version of the Slurm scheduler, which is now able to assign remote GPUs to jobs. Results show that cluster throughput, measured as jobs completed per time unit, is doubled at the same time that the total energy consumption is reduced up to 40%. GPU utilization is also increased.Generalitat Valenciana, Grant/Award Number: PROMETEO/2017/077; MINECO and FEDER, Grant/Award Number: TIN2014-53495-R, TIN2015-65316-P and TIN2017-82972-RIserte, S.; Prades, J.; Reaño González, C.; Silla, F. (2021). Improving the management efficiency of GPU workloads in data centers through GPU virtualization. Concurrency and Computation: Practice and Experience. 33(2):1-16. https://doi.org/10.1002/cpe.5275S11633

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    CU2rCU: A CUDA-to-rCUDA Converter

    Full text link
    [ES] Las GPUs (Graphics Processor Units, unidades de procesamiento gráfico) están siendo cada vez más utilizadas en el campo de la HPC (High Performance Computing, computación de altas prestaciones) como una forma eficaz de reducir el tiempo de ejecución de las aplicaciones mediante la aceleración de determinadas partes de las mismas. CUDA (Compute Unified Device Architecture, arquitectura de dispositivos de cómputo unificado) es una tecnología desarrollada por NVIDIA que permite llevar a cabo dicha aceleración, proporcionando para ello una arquitectura de cálculo paralelo. Sin embargo, la utilización de GPUs en el ámbito de la HPC presenta ciertas desventajas, principalmente, en el coste de adquisición y el aumento de energía que introducen. Para hacer frente a estos inconvenientes se desarrolló rCUDA (remote CUDA, CUDA remoto), una tecnología que permite compartir dispositivos CUDA de forma remota, reduciendo así tanto el coste de adquisición como el consumo de energía. En las versiones iniciales de rCUDA quedó demostrada su viabilidad, pero también se identificaron algunos aspectos susceptibles de ser mejorados en relación con su usabilidad. Ésta se veía afectada por el hecho de que rCUDA no soporta las extensiones de CUDA al lenguaje C. De esta forma, era necesario convertir manualmente las aplicaciones CUDA eliminando dichas extensiones, y utilizando únicamente C plano. En este documento presentamos una herramienta que realiza éstas conversiones de manera automática, permitiendo así adaptar las aplicaciones CUDA a rCUDA de una manera sencilla[EN] GPUs (Graphics Processor Units) are being increasingly embraced by the high performance computing and computational communities as an effective way of considerably reducing application execution time by accelerating significant parts of their codes. CUDA (Compute Unified Device Architecture) is a new technology developed by NVIDIA which leverages the parallel compute engine in GPUs. However, the use of GPUs in current HPC clusters presents certain negative side-effects, mainly related with acquisition costs and power consumption. rCUDA (remote CUDA) was recently developed as a software solution to address these concerns. Specifically, it is a middleware that allows transparently sharing a reduced number of CUDA-compatible GPUs among the nodes in a cluster, reducing acquisition costs and power consumption. While the initial prototype versions of rCUDA demonstrated its functionality, they also revealed several concerns related with usability and performance. With respect to usability, the rCUDA framework was limited by its lack of support for the CUDA extensions to the C language. Thus, it was necessary to manually convert the original CUDA source code into C plain code functionally identical but that does not include such extensions. For such purpose, in this document we present a new component of the rCUDA suite that allows an automatic transformation of any CUDA source code into plain C code, so that it can be effectively accommodated within the rCUDA technology.Reaño González, C. (2012). CU2rCU: A CUDA-to-rCUDA Converter. http://hdl.handle.net/10251/27435Archivo delegad

    Structure evolution of spin-caoted phase separated EC/HPC films

    Get PDF
    Porous phase-separated films made of ethylcellulose (EC) and hydroxypropylcellulose (HPC) are commonly used for controlled drug release. The structure of these thin films is controlling the drug transport from the core to the surrounding liquid in the stomach or intestine. However, detailed understanding of the structure evolution is lacking. In this work, we use spin-coating, a widely applied technique for making thin uniform polymer films, to mimic the industrial manufacturing process of fluidized bed spraying. The aim of this work is to understand the structure evolution and phase separation kinetics of single layer and multi-layer spin-coated EC/HPC films. The structure evolution is characterized using confocal laser scanning microscopy (CLSM) and image analysis.The influence of spin-coating parameters and EC:HPC ratio on the final phase-separated structure and the film thickness was determined. Varying spin speed and EC:HPC ratio gave us precise control over the characteristic length scale and thickness of the film. The results show that the phase separation occurs through spinodal decomposition and that the characteristic length scale increases with decreasing spin speed and with increasing HPC ratio. The thickness of the spin-coated film decreases with increasing spin speed.Furthermore, optimized spin-coating parameters were selected to study the kinetics of phase separation in situ, in particular the coarsening mechanisms and the time dependence of the domain’s growth as a function of EC:HPC ratio. We identified two main coarsening mechanisms: interfacial tension driven hydrodynamic growth for the bicontinuous structure and diffusion driven coalescence for the discontinuous structures. In addition, we obtained information on the wetting, the shrinkage, and the evaporation process by looking at a film cross section, which allowed an estimation of the binodal of the phase diagram.The findings from this work give a good understanding of the mechanisms responsible for the morphology development and open the road towards tailoring thin EC/HPC film structures for controlled drug release
    corecore