253 research outputs found

    Power And Hotspot Modeling For Modern GPUs

    Get PDF
    As General Purpose GPUs (GPGPU) are increasingly becoming a prominent component of high performance computing platforms, power and thermal dissipation are getting more attention. The trade-offs among performance, power, and heat must be well modeled and evaluated from the early stage of GPU design. This necessitates a tool that allows GPU architects to quickly and accurately evaluate their design. There are a few models for GPU power but most of them estimate power at a higher level than architecture, which are therefore missing hardware reconfigurability. In this thesis, we propose a framework that models power and heat dissipation at the hardware architecture level, which allows for configuring and investigating individual hardware components. Our framework is also capable of visualizing the heat map of the processor over different clock cycles. To the best of our knowledge, this is the first comprehensive framework that integrates and visualizes power consumption and heat dissipation of GPUs

    Computational Sprinting: Exceeding Sustainable Power in Thermally Constrained Systems

    Get PDF
    Although process technology trends predict that transistor sizes will continue to shrink for a few more generations, voltage scaling has stalled and thus future chips are projected to be increasingly more power hungry than previous generations. Particularly in mobile devices which are severely cooling constrained, it is estimated that the peak operation of a future chip could generate heat ten times faster than than the device can sustainably vent. However, many mobile applications do not demand sustained performance; rather they comprise short bursts of computation in response to sporadic user activity. To improve responsiveness for such applications, this dissertation proposes computational sprinting, in which a system greatly exceeds sustainable power margins (by up to 10Ã?) to provide up to a few seconds of high-performance computation when a user interacts with the device. Computational sprinting exploits the material property of thermal capacitance to temporarily store the excess heat generated when sprinting. After sprinting, the chip returns to sustainable power levels and dissipates the stored heat when the system is idle. This dissertation: (i) broadly analyzes thermal, electrical, hardware, and software considerations to analyze the feasibility of engineering a system which can provide the responsiveness of a plat- form with 10Ã? higher sustainable power within today\u27s cooling constraints, (ii) leverages existing sources of thermal capacitance to demonstrate sprinting on a real system today, and (iii) identifies the energy-performance characteristics of sprinting operation to determine runtime sprint pacing policies

    Real-time fluid simulations under smoothed particle hydrodynamics for coupled kinematic modelling in robotic applications

    Get PDF
    Although solids and fluids can be conceived as continuum media, applications of solid and fluid dynamics differ greatly from each other in their theoretical models and their physical behavior. That is why the computer simulators of each turn to be very disparate and case-oriented. The aim of this research work, captured in this thesis book, is to find a fluid dynamics model that can be implemented in near real-time with GPU processing and that can be adapted to typically large scales found in robotic devices in action with fluid media. More specifically, the objective is to develop these fast fluid simulations, comprising different solid body dynamics, to find a viable time kinematic solution for robotics. The tested cases are: i) the case of a fluid in a closed channel flowing across a cylinder, ii) the case of a fluid flowing across a controlled profile, and iii), the case of a free surface fluid control during pouring. The implementation of the former cases settles the formulations and constraints to the latter applications. The results will allow the reader not only to sustain the implemented models but also to break down the software implementation concepts for better comprehension. A fast GPU-based fluid dynamics simulation is detailed in the main implementation. The results show that it can be used in real-time to allow robotics to perform a blind pouring task with a conventional controller and no special sensing systems nor knowledge-driven prediction models would be necessary.Aunque los sólidos y los fluidos pueden concebirse como medios continuos, las aplicaciones de la dinámica de sólidos y fluidos difieren mucho entre sí en sus modelos teóricos y su comportamiento físico. Es por eso que los simuladores por computadora de cada uno son muy dispares y están orientados al caso de su aplicación. El objetivo de este trabajo de investigación, capturado en este libro de tesis, es encontrar un modelo de dinámica de fluidos que se pueda implementar cercano al tiempo real con procesamiento GPU y que se pueda adaptar a escalas típicamente grandes que se encuentran en dispositivos robóticos en acción con medios fluidos. Específicamente, el propósito es desarrollar estas simulaciones de fluidos rápidos, que comprenden diferentes dinámicas de cuerpos sólidos, para encontrar una solución cinemática viable para robótica. Los casos probados son: i) el caso de un fluido en canal cerrado que fluye a través de un cilindro, ii) el caso de un fluido que fluye a través de un alabe controlado, y iii), el caso del control de un fluido de superficie libre durante el vertido. La implementación de estos primeros casos establece las formulaciones y limitaciones de aplicaciones futuras. Los resultados permitirán al lector no solo sostener los modelos implementados sino también desglosar los conceptos de la implementación en software para una mejor comprensión. En la implementación principal se consigue una simulación rápida de dinámica de fluidos basada en GPU. Los resultados muestran que esta implementación se puede utilizar en tiempo real para permitir que la robótica realice una tarea de vertido ciego con un controlador convencional sin que sea necesario algún sistema de sensado especial ni algún modelo predictivo basados en el conocimiento.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Carmen Martínez Arévalo.- Secretario: Luis Santiago Garrido Bullón.- Vocal: Benjamín Hernández Arreguí

    Simulated Annealing

    Get PDF
    The book contains 15 chapters presenting recent contributions of top researchers working with Simulated Annealing (SA). Although it represents a small sample of the research activity on SA, the book will certainly serve as a valuable tool for researchers interested in getting involved in this multidisciplinary field. In fact, one of the salient features is that the book is highly multidisciplinary in terms of application areas since it assembles experts from the fields of Biology, Telecommunications, Geology, Electronics and Medicine

    Real-Time Simulation and Prognosis of Smoke Propagation in Compartments Using a GPU

    Get PDF
    The evaluation of life safety in buildings in case of fire is often based on smoke spread calculations. However, recent simulation models – in general, based on computational fluid dynamics – often require long execution times or high-performance computers to achieve simulation results in or faster than real-time. Therefore, the objective of this study is the development of a concept for the real-time and prognosis simulation of smoke propagation in compartments using a graphics processing unit (GPU). The developed concept is summarized in an expandable open source software basis, called JuROr (Jülich's Real-time simulation within ORPHEUS). JuROr simulates buoyancy-driven, turbulent smoke spread based on a reduced modeling approach using finite differences and a Large Eddy Simulation turbulence model to solve the incompressible Navier-Stokes and energy equations. This reduced model is fully adapted to match the target hardware of highly parallel computer architectures. Thereby, the code is written in the object-oriented programming language C++ and the pragma-based programming model OpenACC. This model ensures to maintain a single source code, which can be executed in serial and parallel on various architectures. Further, the study provides a proof of JuROr's concept to balance sufficient accuracy and practicality. First, the code was successfully verified using unit and (semi-) analytical tests. Then, the underlying model was validated by comparing the numerical results to the experimental results of scenarios relevant for fire protection. Thereby, verification and validation showed acceptable accuracy for JuROr's application. Lastly, the performance criteria of JuROr – being real-time and prognosis capable with comparable performance across various architectures – was successfully evaluated. Here, JuROr also showed high speedup results on a GPU and faster time-to-solution compared to the established Fire Dynamics Simulator. These results show JuROr's practicality.Die Bewertung der Personensicherheit bei Feuer in Gebäuden basiert häufig auf Berechnungen zur Rauchausbreitung. Bisherige Simulationsmodelle – im Allgemeinen basierend auf numerischer Strömungsdynamik – erfordern jedoch lange Ausführungszeiten oder Hochleistungsrechner, um Simulationsergebnisse in und schneller als Echtzeit liefern zu können. Daher ist das Ziel dieser Arbeit die Entwicklung eines Konzeptes für die Echtzeit- und Prognosesimulation der Rauchausbreitung in Gebäuden mit Hilfe eines Grafikprozessors (GPU). Zusammengefasst ist das entwickelte Konzept in einer erweiterbaren Open-Source-Software, genannt JuROr (Jülich's Real-time Simulation in ORPHEUS). JuROr simuliert die Ausbreitung von auftriebsgetriebenem, turbulentem Rauch basierend auf einem reduzierten Modellierungsansatz mit finiten Differenzen und einem Large Eddy Simulation Turbulenzmodell, um inkompressible Navier- Stokes und Energiegleichungen zu lösen. Das reduzierte Modell ist voll- ständig angepasst an hochparallele Computerarchitekturen. Dabei ist der Code implementiert mit C++ und OpenACC. Dies hat den Vorteil mit nur einem Quellcode verschiedenste serielle und parallele Ausführungen des Programms für unterschiedliche Architekturen erstellen zu können. Die Studie liefert weiterhin einen Konzeptnachweis dafür, ausreichende Genauigkeit und Praktikabilität im Gleichgewicht zu halten. Zunächst wurde der Code erfolgreich mit Modul- und (semi-) analytischen Tests verifiziert. Dann wurde das zugrundeliegende Modell durch einen Vergleich der numerischen mit den experimentellen Ergebnissen für den Brandschutz relevanter Szenarien validiert. Die Verifizierung und Validierung zeigten dabei ausreichende Genauigkeit für JuROr. Zuletzt, wurden die Kriterien von JuROr – echtzeit- und prognosefähig zu sein mit vergleichbarer Leistung auf unterschiedlichsten Architekturen – erfolgreich geprüft. Zudem zeigte JuROr hohe Beschleunigungseffekte auf einer GPU und schnellere Lösungszeiten im Vergleich zum etablierten Fire Dynamics Simulator. Diese Ergebnisse zeigen JuROr's Praktikabilität

    Heterogeneous Computing with Focus on Mechanical Engineering

    Get PDF
    During the past few years there has been a revolution in the design of desktop computers. Most processors today include more than one processor core, allowing parallel execution of programs. Furthermore, most commodity computers include a graphical processor that outperforms the central processor by at least one order of magnitude. Tapping into this vast resource is commonly referred to as heterogeneous computing. The change in hardware invalidates old software-design truths. There is therefore need for new algorithms, and research into adapting existing algorithms to these architectures. Our main focus has been to accelerate algorithms relevant for mechanical engineering. In this dissertation we present four algorithms devoted to take advantage of the computational strengths of heterogeneous architectures. Each work is based on state-of-the-art hardware available at the time the research was performed. First we describe an algorithm for high-quality visualization of parametric surfaces. This is useful in a CAD setting, were an accurate rendering is important for visual validation of model quality. We further describe simulation of shallow-water waves using a state-of-the-art numerical scheme. Our accelerated implementation gave a speedup of up to 40 times compared to an optimized reference implementation. Our implementation features real time simulation and visualization of semi-realistic nonlinear wave effects. Finally we present two algorithms for shape simplification of 3D-models. The algorithms aim at reducing time spent on preparing models for finite element analysis. Finite element analysis is important to determine mechanical properties of objects prior to manufacture. Such analysis can be used to investigate thermal behavior and determine the strengths and weaknesses of physical components. Before the analysis can take place the models must undergo a preparation phase where shape simplification plays an important role. The first work we describe for shape simplification is a hybrid algorithm, using graphics hardware for the computationally demanding operations, and the main processor for maintaining the data structure. Our second work describes a shape simplification algorithm highly suitable for heterogeneous architectures and a reference implementation on the Cell BE

    GPU Based Real-time Welding Simulation with Smoothed-Particle Hydrodynamics

    Get PDF
    Welding training is essential in the development of industrialization. A good welder will build robust workpieces that ensure the safety and stability of the product. However, training a welder requires lots of time and access professional welding equipment. Therefore, it is desirable to have a training system that is economical and easy to use. After decades development of computer graphics, sophisticated methodologies are developed in simulation fields, along the advanced hardware, enables the possibility of simulation welding with software. In this thesis, a novel prototype of welding training system is proposed. We use smoothed-particle hydrodynamics (SPH) method to simulate fluid as well as heat transfer and phase changing. In order to accelerate the processing to reach the level of real-time, we adopt CUDA to implement the SPH solver on GPU. Plus, Leap Motion is utilized as the input device to control the welding gun. As the result, the simulation reaches decent frame rate that allows the user control the simulation system interactively. The input device permits the user to adapt to the system in less than 5 minutes. This prototype shows a new direction in the training system that combines VR, graphics, and physics simulation. The further development of VR output device like Oculus Rift will enable the training system to a more immersive level

    Computational modelling of diffusion magnetic resonance imaging based on cardiac histology

    Get PDF
    The exact relationship between changes in myocardial microstructure as a result of heart disease and the signal measured using diffusion tensor cardiovascular magnetic resonance (DT-CMR) is currently not well understood. Computational modelling of diffusion in combination with realistic numerical phantoms offers the unique opportunity to study effects of pathologies or the efficacy of improvements to acquisition protocols in a controlled in-silico environment. In this work, Monte Carlo random walk (MCRW) methods are used to simulate diffusion in a histology-based 3D model of the myocardium. Sensitivity of typical DT-CMR sequences to changes in tissue properties is assessed. First, myocardial tissue is analysed to identify important geometric features and diffusion parameters. A two-compartment model is considered where intra-cellular compartments with a reduced bulk diffusion coefficient are separated from extra-cellular space by permeable membranes. Secondary structures like groups of cardiomyocyte (sheetlets) must also be included, and different methods are developed to automatically generate realistic histology-based substrates. Next, in-silico simulation of DT-CMR is reviewed and a tool to generate idealised versions of common pulse sequences is discussed. An efficient GPU-based numerical scheme for obtaining a continuum solution to the Bloch--Torrey equations is presented and applied to domains directly extracted from histology images. In order to verify the numerical methods used throughout this work, an analytical solution to the diffusion equation in 1D is described. It relies on spectral analysis of the diffusion operator and requires that all roots of a complex transcendental equation are found. To facilitate a fast and reliable solution, a novel root finding algorithm based on Chebyshev polynomial interpolation is proposed. To simulate realistic 3D geometries MCRW methods are employed. A parallel simulator for both grid-based and surface mesh--based geometries is presented. The presence of permeable membranes requires special treatment. For this, a commonly used transit model is analysed. Finally, the methods above are applied to study the effect of various model and sequence parameters on DT-CMR results. Simulations with impermeable membranes reveal sequence-specific sensitivity to extra-cellular volume fraction and diffusion coefficients. By including membrane permeability, DT-CMR results further approach values expected in vivo.Open Acces

    A differentiable physics engine for deep learning in robotics

    Get PDF
    An important field in robotics is the optimization of controllers. Currently, robots are often treated as a black box in this optimization process, which is the reason why derivative-free optimization methods such as evolutionary algorithms or reinforcement learning are omnipresent. When gradient-based methods are used, models are kept small or rely on finite difference approximations for the Jacobian. This method quickly grows expensive with increasing numbers of parameters, such as found in deep learning. We propose an implementation of a modern physics engine, which can differentiate control parameters. This engine is implemented for both CPU and GPU. Firstly, this paper shows how such an engine speeds up the optimization process, even for small problems. Furthermore, it explains why this is an alternative approach to deep Q-learning, for using deep learning in robotics. Finally, we argue that this is a big step for deep learning in robotics, as it opens up new possibilities to optimize robots, both in hardware and software
    corecore