103 research outputs found

    GPU devices for safety-critical systems: a survey

    Get PDF
    Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft

    Performance engineering of data-intensive applications

    Get PDF
    Data-intensive programs deal with big chunks of data and often contain compute-intensive characteristics. Among various HPC application domains, big data analytics, machine learning and the more recent deep-learning models are well-known data-intensive applications. An efficient design of such applications demands extensive knowledge of the target hardware and software, particularly the memory/cache hierarchy and the data communication among threads/processes. Such a requirement makes code development an arduous task, as inappropriate data structures and algorithm design may result in superfluous runtime, let alone hardware incompatibilities while porting the code to other platforms. In this dissertation, we introduce a set of tools and methods for the performance engineering of parallel data-intensive programs. We start with performance profiling to gain insights on thread communications and relevant code optimizations. Then, by narrowing down our scope to deep-learning applications, we introduce our tools for enhancing the performance portability and scalability of convolutional neural networks (ConvNet) at inference and training phases. Our first contribution is a novel performance-profiling method to unveil potential communication bottlenecks caused by data-access patterns and thread interactions. Our findings show that the data shared between a pair of threads should be reused with a reasonably short intervals to preserve data locality, yet existing profilers neglect them and mainly report the communication volume. We propose new hardware-independent metrics to characterize thread communication and provide suggestions for applying appropriate optimizations on a specific code region. Our experiments show that applying relevant optimizations improves the performance in Rodinia benchmarks by up to 56%. For the next contribution, we developed a framework for automatic generation of efficient and performance-portable convolution kernels, including Winograd convolutions, for various GPU platforms. We employed a synergy of meta-programming, symbolic execution, and auto-tuning. The results demonstrate efficient kernels generated through an automated optimization pipeline with runtimes close to vendor deep-learning libraries, and the minimum required programming effort confirms the performance portability of our approach. Furthermore, our symbolic execution method exploits repetitive patterns in Winograd convolutions, enabling us to reduce the number of arithmetic operations by up to 62% without compromising the numerical stability. Lastly, we investigate possible methods to scale the performance of ConvNets in training and inference phases. Our specialized training platform equipped with a novel topology-aware network pruning algorithm enables rapid training, neural architecture search, and network compression. Thus, an AI model training can be easily scaled to a multitude of compute nodes, leading to faster model design with less operating costs. Furthermore, the network compression component scales a ConvNet model down by removing redundant layers, preparing the model for a more pertinent deployment. Altogether, this work demonstrates the necessity and shows the benefit of performance engineering and parallel programming methods in accelerating emerging data-intensive workloads. With the help of the proposed tools and techniques, we pinpoint data communication bottlenecks and achieve performance portability and scalability in data-intensive applications

    Evaluating the performance of HPC-style SYCL applications

    Get PDF

    Interactive Three-Dimensional Simulation and Visualisation of Real Time Blood Flow in Vascular Networks

    Get PDF
    One of the challenges in cardiovascular disease management is the clinical decision-making process. When a clinician is dealing with complex and uncertain situations, the decision on whether or how to intervene is made based upon distinct information from diverse sources. There are several variables that can affect how the vascular system responds to treatment. These include: the extent of the damage and scarring, the efficiency of blood flow remodelling, and any associated pathology. Moreover, the effect of an intervention may lead to further unforeseen complications (e.g. another stenosis may be “hidden” further along the vessel). Currently, there is no tool for predicting or exploring such scenarios. This thesis explores the development of a highly adaptive real-time simulation of blood flow that considers patient specific data and clinician interaction. The simulation should model blood realistically, accurately, and through complex vascular networks in real-time. Developing robust flow scenarios that can be incorporated into the decision and planning medical tool set. The focus will be on specific regions of the anatomy, where accuracy is of the utmost importance and the flow can develop into specific patterns, with the aim of better understanding their condition and predicting factors of their future evolution. Results from the validation of the simulation showed promising comparisons with the literature and demonstrated a viability for clinical use

    Real-time Visual Flow Algorithms for Robotic Applications

    Get PDF
    Vision offers important sensor cues to modern robotic platforms. Applications such as control of aerial vehicles, visual servoing, simultaneous localization and mapping, navigation and more recently, learning, are examples where visual information is fundamental to accomplish tasks. However, the use of computer vision algorithms carries the computational cost of extracting useful information from the stream of raw pixel data. The most sophisticated algorithms use complex mathematical formulations leading typically to computationally expensive, and consequently, slow implementations. Even with modern computing resources, high-speed and high-resolution video feed can only be used for basic image processing operations. For a vision algorithm to be integrated on a robotic system, the output of the algorithm should be provided in real time, that is, at least at the same frequency as the control logic of the robot. With robotic vehicles becoming more dynamic and ubiquitous, this places higher requirements to the vision processing pipeline. This thesis addresses the problem of estimating dense visual flow information in real time. The contributions of this work are threefold. First, it introduces a new filtering algorithm for the estimation of dense optical flow at frame rates as fast as 800 Hz for 640x480 image resolution. The algorithm follows a update-prediction architecture to estimate dense optical flow fields incrementally over time. A fundamental component of the algorithm is the modeling of the spatio-temporal evolution of the optical flow field by means of partial differential equations. Numerical predictors can implement such PDEs to propagate current estimation of flow forward in time. Experimental validation of the algorithm is provided using high-speed ground truth image dataset as well as real-life video data at 300 Hz. The second contribution is a new type of visual flow named structure flow. Mathematically, structure flow is the three-dimensional scene flow scaled by the inverse depth at each pixel in the image. Intuitively, it is the complete velocity field associated with image motion, including both optical flow and scale-change or apparent divergence of the image. Analogously to optic flow, structure flow provides a robotic vehicle with perception of the motion of the environment as seen by the camera. However, structure flow encodes the full 3D image motion of the scene whereas optic flow only encodes the component on the image plane. An algorithm to estimate structure flow from image and depth measurements is proposed based on the same filtering idea used to estimate optical flow. The final contribution is the spherepix data structure for processing spherical images. This data structure is the numerical back-end used for the real-time implementation of the structure flow filter. It consists of a set of overlapping patches covering the surface of the sphere. Each individual patch approximately holds properties such as orthogonality and equidistance of points, thus allowing efficient implementations of low-level classical 2D convolution based image processing routines such as Gaussian filters and numerical derivatives. These algorithms are implemented on GPU hardware and can be integrated to future Robotic Embedded Vision systems to provide fast visual information to robotic vehicles

    Visualization challenges in distributed heterogeneous computing environments

    Get PDF
    Large-scale computing environments are important for many aspects of modern life. They drive scientific research in biology and physics, facilitate industrial rapid prototyping, and provide information relevant to everyday life such as weather forecasts. Their computational power grows steadily to provide faster response times and to satisfy the demand for higher complexity in simulation models as well as more details and higher resolutions in visualizations. For some years now, the prevailing trend for these large systems is the utilization of additional processors, like graphics processing units. These heterogeneous systems, that employ more than one kind of processor, are becoming increasingly widespread since they provide many benefits, like higher performance or increased energy efficiency. At the same time, they are more challenging and complex to use because the various processing units differ in their architecture and programming model. This heterogeneity is often addressed by abstraction but existing approaches often entail restrictions or are not universally applicable. As these systems also grow in size and complexity, they become more prone to errors and failures. Therefore, developers and users become more interested in resilience besides traditional aspects, like performance and usability. While fault tolerance is well researched in general, it is mostly dismissed in distributed visualization or not adapted to its special requirements. Finally, analysis and tuning of these systems and their software is required to assess their status and to improve their performance. The available tools and methods to capture and evaluate the necessary information are often isolated from the context or not designed for interactive use cases. These problems are amplified in heterogeneous computing environments, since more data is available and required for the analysis. Additionally, real-time feedback is required in distributed visualization to correlate user interactions to performance characteristics and to decide on the validity and correctness of the data and its visualization. This thesis presents contributions to all of these aspects. Two approaches to abstraction are explored for general purpose computing on graphics processing units and visualization in heterogeneous computing environments. The first approach hides details of different processing units and allows using them in a unified manner. The second approach employs per-pixel linked lists as a generic framework for compositing and simplifying order-independent transparency for distributed visualization. Traditional methods for fault tolerance in high performance computing systems are discussed in the context of distributed visualization. On this basis, strategies for fault-tolerant distributed visualization are derived and organized in a taxonomy. Example implementations of these strategies, their trade-offs, and resulting implications are discussed. For analysis, local graph exploration and tuning of volume visualization are evaluated. Challenges in dense graphs like visual clutter, ambiguity, and inclusion of additional attributes are tackled in node-link diagrams using a lens metaphor as well as supplementary views. An exploratory approach for performance analysis and tuning of parallel volume visualization on a large, high-resolution display is evaluated. This thesis takes a broader look at the issues of distributed visualization on large displays and heterogeneous computing environments for the first time. While the presented approaches all solve individual challenges and are successfully employed in this context, their joint utility form a solid basis for future research in this young field. In its entirety, this thesis presents building blocks for robust distributed visualization on current and future heterogeneous visualization environments.Große Rechenumgebungen sind für viele Aspekte des modernen Lebens wichtig. Sie treiben wissenschaftliche Forschung in Biologie und Physik, ermöglichen die rasche Entwicklung von Prototypen in der Industrie und stellen wichtige Informationen für das tägliche Leben, beispielsweise Wettervorhersagen, bereit. Ihre Rechenleistung steigt stetig, um Resultate schneller zu berechnen und dem Wunsch nach komplexeren Simulationsmodellen sowie höheren Auflösungen in der Visualisierung nachzukommen. Seit einigen Jahren ist die Nutzung von zusätzlichen Prozessoren, z.B. Grafikprozessoren, der vorherrschende Trend für diese Systeme. Diese heterogenen Systeme, welche mehr als eine Art von Prozessor verwenden, finden zunehmend mehr Verbreitung, da sie viele Vorzüge, wie höhere Leistung oder erhöhte Energieeffizienz, bieten. Gleichzeitig sind diese jedoch aufwendiger und komplexer in der Nutzung, da die verschiedenen Prozessoren sich in Architektur und Programmiermodel unterscheiden. Diese Heterogenität wird oft durch Abstraktion angegangen, aber bisherige Ansätze sind häufig nicht universal anwendbar oder bringen Einschränkungen mit sich. Diese Systeme werden zusätzlich anfälliger für Fehler und Ausfälle, da ihre Größe und Komplexität zunimmt. Entwickler sind daher neben traditionellen Aspekten, wie Leistung und Bedienbarkeit, zunehmend an Widerstandfähigkeit gegenüber Fehlern und Ausfällen interessiert. Obwohl Fehlertoleranz im Allgemeinen gut untersucht ist, wird diese in der verteilten Visualisierung oft ignoriert oder nicht auf die speziellen Umstände dieses Feldes angepasst. Analyse und Optimierung dieser Systeme und ihrer Software ist notwendig, um deren Zustand einzuschätzen und ihre Leistung zu verbessern. Die verfügbaren Werkzeuge und Methoden, um die erforderlichen Informationen zu sammeln und auszuwerten, sind oft vom Kontext entkoppelt oder nicht für interaktive Szenarien ausgelegt. Diese Probleme sind in heterogenen Rechenumgebungen verstärkt, da dort mehr Daten für die Analyse verfügbar und notwendig sind. Für verteilte Visualisierung ist zusätzlich Rückmeldung in Echtzeit notwendig, um Interaktionen der Benutzer mit Leistungscharakteristika zu korrelieren und um die Gültigkeit und Korrektheit der Daten und ihrer Visualisierung zu entscheiden. Diese Dissertation präsentiert Beiträge für all diese Aspekte. Zunächst werden zwei Ansätze zur Abstraktion im Kontext von generischen Berechnungen auf Grafikprozessoren und Visualisierung in heterogenen Umgebungen untersucht. Der erste Ansatz verbirgt Details verschiedener Prozessoren und ermöglicht deren Nutzung über einheitliche Schnittstellen. Der zweite Ansatz verwendet pro-Pixel verkettete Listen (per-pixel linked lists) zur Kombination von Pixelfarben und zur Vereinfachung von ordnungsunabhängiger Transparenz in verteilter Visualisierung. Übliche Fehlertoleranz-Methoden im Hochleistungsrechnen werden im Kontext der verteilten Visualisierung diskutiert. Auf dieser Grundlage werden Strategien für fehlertolerante verteilte Visualisierung abgeleitet und in einer Taxonomie organisiert. Beispielhafte Umsetzungen dieser Strategien, ihre Kompromisse und Zugeständnisse, und die daraus resultierenden Implikationen werden diskutiert. Zur Analyse werden lokale Exploration von Graphen und die Optimierung von Volumenvisualisierung untersucht. Herausforderungen in dichten Graphen wie visuelle Überladung, Ambiguität und Einbindung zusätzlicher Attribute werden in Knoten-Kanten Diagrammen mit einer Linsenmetapher sowie ergänzenden Ansichten der Daten angegangen. Ein explorativer Ansatz zur Leistungsanalyse und Optimierung paralleler Volumenvisualisierung auf einer großen, hochaufgelösten Anzeige wird untersucht. Diese Dissertation betrachtet zum ersten Mal Fragen der verteilten Visualisierung auf großen Anzeigen und heterogenen Rechenumgebungen in einem größeren Kontext. Während jeder vorgestellte Ansatz individuelle Herausforderungen löst und erfolgreich in diesem Zusammenhang eingesetzt wurde, bilden alle gemeinsam eine solide Basis für künftige Forschung in diesem jungen Feld. In ihrer Gesamtheit präsentiert diese Dissertation Bausteine für robuste verteilte Visualisierung auf aktuellen und künftigen heterogenen Visualisierungsumgebungen

    Real-time fluid simulations under smoothed particle hydrodynamics for coupled kinematic modelling in robotic applications

    Get PDF
    Although solids and fluids can be conceived as continuum media, applications of solid and fluid dynamics differ greatly from each other in their theoretical models and their physical behavior. That is why the computer simulators of each turn to be very disparate and case-oriented. The aim of this research work, captured in this thesis book, is to find a fluid dynamics model that can be implemented in near real-time with GPU processing and that can be adapted to typically large scales found in robotic devices in action with fluid media. More specifically, the objective is to develop these fast fluid simulations, comprising different solid body dynamics, to find a viable time kinematic solution for robotics. The tested cases are: i) the case of a fluid in a closed channel flowing across a cylinder, ii) the case of a fluid flowing across a controlled profile, and iii), the case of a free surface fluid control during pouring. The implementation of the former cases settles the formulations and constraints to the latter applications. The results will allow the reader not only to sustain the implemented models but also to break down the software implementation concepts for better comprehension. A fast GPU-based fluid dynamics simulation is detailed in the main implementation. The results show that it can be used in real-time to allow robotics to perform a blind pouring task with a conventional controller and no special sensing systems nor knowledge-driven prediction models would be necessary.Aunque los sólidos y los fluidos pueden concebirse como medios continuos, las aplicaciones de la dinámica de sólidos y fluidos difieren mucho entre sí en sus modelos teóricos y su comportamiento físico. Es por eso que los simuladores por computadora de cada uno son muy dispares y están orientados al caso de su aplicación. El objetivo de este trabajo de investigación, capturado en este libro de tesis, es encontrar un modelo de dinámica de fluidos que se pueda implementar cercano al tiempo real con procesamiento GPU y que se pueda adaptar a escalas típicamente grandes que se encuentran en dispositivos robóticos en acción con medios fluidos. Específicamente, el propósito es desarrollar estas simulaciones de fluidos rápidos, que comprenden diferentes dinámicas de cuerpos sólidos, para encontrar una solución cinemática viable para robótica. Los casos probados son: i) el caso de un fluido en canal cerrado que fluye a través de un cilindro, ii) el caso de un fluido que fluye a través de un alabe controlado, y iii), el caso del control de un fluido de superficie libre durante el vertido. La implementación de estos primeros casos establece las formulaciones y limitaciones de aplicaciones futuras. Los resultados permitirán al lector no solo sostener los modelos implementados sino también desglosar los conceptos de la implementación en software para una mejor comprensión. En la implementación principal se consigue una simulación rápida de dinámica de fluidos basada en GPU. Los resultados muestran que esta implementación se puede utilizar en tiempo real para permitir que la robótica realice una tarea de vertido ciego con un controlador convencional sin que sea necesario algún sistema de sensado especial ni algún modelo predictivo basados en el conocimiento.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Carmen Martínez Arévalo.- Secretario: Luis Santiago Garrido Bullón.- Vocal: Benjamín Hernández Arreguí
    corecore