1,010 research outputs found

    Efficient Point based Global Illumination on Intel MIC Architecture

    Get PDF
    International audiencePoint-Based Global Illumination (PBGI) is a popular rendering method in special effects and motion picture productions. The tree-cut computation is in general the most time consuming part of this algorithm, but it can be formulated for efficient parallel execution, in particular regarding wide-SIMD hardware. In this context, we propose several vectorization schemes, namely single, packet and hybrid, to maximize the utilization of modern CPU architectures. While for the single scheme, 16 nodes from the hierarchy are processed for a single receiver in parallel, the packet scheme handles one node for 16 receivers. These two schemes work well for scenes having smooth geometry and diffuse material. When the scene contains high frequency bumps maps and glossy reflections, we use a hybrid vectorization method. We conduct experiments on an Intel Many Integrated Corearchitecture and report preliminary results on several scenes, showing that up to a 3x speedup can be achieved when compared with non-vectorized execution

    An evaluation of the GAMA/StarPU frameworks for heterogeneous platforms : the progressive photon mapping algorithm

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaRecent evolution of high performance computing moved towards heterogeneous platforms: multiple devices with different architectures, characteristics and programming models, share application workloads. To aid the programmer to efficiently explore these heterogeneous platforms several frameworks have been under development. These dynamically manage the available computing resources through workload scheduling and data distribution, dealing with the inherent difficulties of different programming models and memory accesses. Among other frameworks, these include GAMA and StarPU. The GAMA framework aims to unify the multiple execution and memory models of each different device in a computer system, into a single, hardware agnostic model. It was designed to efficiently manage resources with both regular and irregular applications, and currently only supports conventional CPU devices and CUDA-enabled accelerators. StarPU has similar goals and features with a wider user based community, but it lacks a single programming model. The main goal of this dissertation was an in-depth evaluation of a heterogeneous framework using a complex application as a case study. GAMA provided the starting vehicle for training, while StarPU was the selected framework for a thorough evaluation. The progressive photon mapping irregular algorithm was the selected case study. The evaluation goal was to assert the StarPU effectiveness with a robust irregular application, and make a high-level comparison with the still under development GAMA, to provide some guidelines for GAMA improvement. Results show that two main factors contribute to the performance of applications written with StarPU: the consideration of data transfers in the performance model, and chosen scheduler. The study also allowed some caveats to be found within the StarPU API. Although this have no effect on performance, they present a challenge for new coming developers. Both these analysis resulted in a better understanding of the framework, and a comparative analysis with GAMA could be made, pointing out the aspects where GAMA could be further improved upon.A recente evolução da computação de alto desempenho é em direção ao uso de plataformas heterogéneas: múltiplos dispositivos com diferentes arquiteturas, características e modelos de programação, partilhando a carga computacional das aplicações. De modo a ajudar o programador a explorar eficientemente estas plataformas, várias frameworks têm sido desenvolvidas. Estas frameworks gerem os recursos computacionais disponíveis, tratando das dificuldades inerentes dos diferentes modelos de programação e acessos à memória. Entre outras frameworks, estas incluem o GAMA e o StarPU. O GAMA tem o objetivo de unificar os múltiplos modelos de execução e memória de cada dispositivo diferente num sistema computacional, transformando-os num único modelo, independente do hardware utilizado. A framework foi desenhada de forma a gerir eficientemente os recursos, tanto para aplicações regulares como irregulares, e atualmente suporta apenas CPUs convencionais e aceleradores CUDA. O StarPU tem objetivos e funcionalidades idênticos, e também uma comunidade mais alargada, mas não possui um modelo de programação único O objetivo principal desta dissertação foi uma avaliação profunda de uma framework heterogénea, usando uma aplicação complexa como caso de estudo. O GAMA serviu como ponto de partida para treino e ambientação, enquanto que o StarPU foi a framework selecionada para uma avaliação mais profunda. O algoritmo irregular de progressive photon mapping foi o caso de estudo escolhido. O objetivo da avaliação foi determinar a eficácia do StarPU com uma aplicação robusta, e fazer uma análise de alto nível com o GAMA, que ainda está em desenvolvimento, para forma a providenciar algumas sugestões para o seu melhoramento. Os resultados mostram que são dois os principais factores que contribuem para a performance de aplicação escritas com auxílio do StarPU: a avaliação dos tempos de transferência de dados no modelo de performance, e a escolha do escalonador. O estudo permitiu também avaliar algumas lacunas na API do StarPU. Embora estas não tenham efeitos visíveis na eficiencia da framework, eles tornam-se um desafio para recém-chegados ao StarPU. Ambas estas análisos resultaram numa melhor compreensão da framework, e numa análise comparativa com o GAMA, onde são apontados os possíveis aspectos que o este tem a melhorar.Fundação para a Ciência e a Tecnologia (FCT) - Program UT Austin | Portuga

    Interactive High Performance Volume Rendering

    Get PDF
    This thesis is about Direct Volume Rendering on high performance computing systems. As direct rendering methods do not create a lower-dimensional geometric representation, the whole scientific dataset must be kept in memory. Thus, this family of algorithms has a tremendous resource demand. Direct Volume Rendering algorithms in general are well suited to be implemented for dedicated graphics hardware. Nevertheless, high performance computing systems often do not provide resources for hardware accelerated rendering, so that the visualization algorithm must be implemented for the available general-purpose hardware. Ever growing datasets that imply copying large amounts of data from the compute system to the workstation of the scientist, and the need to review intermediate simulation results, make porting Direct Volume Rendering to high performance computing systems highly relevant. The contribution of this thesis is twofold. As part of the first contribution, after devising a software architecture for general implementations of Direct Volume Rendering on highly parallel platforms, parallelization issues and implementation details for various modern architectures are discussed. The contribution results in a highly parallel implementation that tackles several platforms. The second contribution is concerned with the display phase of the “Distributed Volume Rendering Pipeline”. Rendering on a high performance computing system typically implies displaying the rendered result at a remote location. This thesis presents a remote rendering technique that is capable of hiding latency and can thus be used in an interactive environment

    Security Management for The Internet of Things

    Get PDF
    The expansion of Internet connected automation provides a number of opportunities and applications that were not imaginable before. A prominent example is the Internet of things (IoT). IoT is a network system that consists of many wired or wireless smart sensors and applications. The development of IoT has been taking decades. However, cyberattacks threat the IoT since the day it was born; different threats and attacks may cause serious disasters to the network system without the essential security protection. Thus, the security and the management of the IoT security system become quite significant. This research work into security management of IoT involves five sections. We first point out the conception and background of the IoT. Then, the security requirements for the IoT have been discussed intensively. Next a proposed layered-security management architecture has been outlined and described. An example of how conveniently this proposed architecture can be used to come up with the security management for a network of the IoT is explained in detail. Finally, summarise the results of implementing the proposed security functions architecture to obtain the efficient and strong security in an IoT environment

    Masivně paralelní implementace algoritmů počítačové grafiky

    Get PDF
    Computer graphics, since its inception in the 1960s, has made great progress. It has become part of everyday life. We can see it all around us, from smartwatches and smartphones, where graphic accelerators are already part of the chips and can render not only interactive menus but also demanding graphic applications, to laptops and personal computers as well as to high-performance visualization servers and supercomputers that can display demanding simulations in real time. In this dissertation we focus on one of the most computationally demanding area of computer graphics and that is the computation of global illumination. One of the most widely used methods for simulating global illumination is the path tracing method. Using this method, we can visualize, for example, scientific or medical data. The path tracing method can be accelerated using multiple graphical accelerators, which we will focus on in this work. We will present a solution for path tracing of massive scenes on multiple GPUs. Our approach analyzes the memory access pattern of the path tracer and defines how the scene data should be distributed across up to 16 GPUs with minimal performance impact. The key concept is that the parts of the scene that have the highest number of memory accesses are replicated across all GPUs. We present two methods for maximizing the performance of path tracing when dealing with partially distributed scene data. Both methods operate at the memory management level, and therefore the path tracing data structures do not need to be redesigned. We implemented this new out-of-core mechanism in the open-source Blender Cycles path tracer, which we also extended with technologies that support running on supercomputers and can take advantage of all accelerators allocated on multiple nodes. In this work, we also introduce a new service that uses our extended version of the Blender Cycles renderer to simplify sending and running jobs directly from Blender.Počítačová grafika od svého vzniku v 60. letech 20. století udělala velký pokrok. Stala se součástí každodenního života. Můžeme ji vidět všude kolem nás, od chytrých hodinek a smartphonů, kde jsou grafické akcelerátory již součástí čipů a dokáží vykreslovat nejen interaktivní menu, ale i náročné grafické aplikace, přes notebooky a osobní počítače až po výkonné vizualizační servery nebo superpočítače, které dokáží zobrazovat náročné simulace v reálném čase. V této disertační práci se zaměříme na jednu z výpočetně nejnáročnějších oblastí počítačové grafiky, a tou je výpočet globálního osvětlení. Jednou z nejpoužívanějších metod pro simulaci globálního osvětlení je metoda sledování cesty. Pomocí této metody můžeme vizualizovat např. vědecká nebo lékařská data. Metodu sledování cest lze urychlit pomocí několika grafických akcelerátorů, na které se v této práci zaměříme. Představíme řešení pro vykreslování masivních scén na více GPU. Náš přístup analyzuje vzory přístupů k paměti a definuje, jak by měla být data scény rozdělena mezi grafickými akcelerátory s minimální ztrátou výkonu. Klíčovým konceptem je, že části scény, které mají nejvyšší počet přístupů do paměti, jsou replikovány na všech grafických akcelerátorech. Představíme dvě metody pro maximalizaci výkonu vykreslování při práci s částečně distribuovanými daty scény. Obě metody pracují na úrovni správy paměti, a proto není třeba datové struktury přepracovávat. Tento nový out-of-core mechanismus jsme implementovali do open-source path traceru Blender Cycles, který jsme také rozšířili o technologie podporující běh na superpočítačích a schopné využít všechny akcelerátory alokované na více uzlech. V této práci také představíme novou službu, která využívá naši rozšířenou verzi Blender Cycles a zjednodušuje odesílání a spouštění úloh přímo z programu Blender.96220 - Laboratoř pro výzkum infrastrukturyvyhově

    Heterogeneity, High Performance Computing, Self-Organization and the Cloud

    Get PDF
    application; blueprints; self-management; self-organisation; resource management; supply chain; big data; PaaS; Saas; HPCaa

    Faster data structures and graphics hardware techniques for high performance rendering

    Get PDF
    Computer generated imagery is used in a wide range of disciplines, each with different requirements. As an example, real-time applications such as computer games have completely different restrictions and demands than offline rendering of feature films. A game has to render quickly using only limited resources, yet present visually adequate images. Film and visual effects rendering may not have strict time requirements but are still required to render efficiently utilizing huge render systems with hundreds or even thousands of CPU cores. In real-time rendering, with limited time and hardware resources, it is always important to produce as high rendering quality as possible given the constraints available. The first paper in this thesis presents an analytical hardware model together with a feed-back system that guarantees the highest level of image quality subject to a limited time budget. As graphics processing units grow more powerful, power consumption becomes a critical issue. Smaller handheld devices have only a limited source of energy, their battery, and both small devices and high-end hardware are required to minimize energy consumption not to overheat. The second paper presents experiments and analysis which consider power usage across a range of real-time rendering algorithms and shadow algorithms executed on high-end, integrated and handheld hardware. Computing accurate reflections and refractions effects has long been considered available only in offline rendering where time isn’t a constraint. The third paper presents a hybrid approach, utilizing the speed of real-time rendering algorithms and hardware with the quality of offline methods to render high quality reflections and refractions in real-time. The fourth and fifth paper present improvements in construction time and quality of Bounding Volume Hierarchies (BVH). Building BVHs faster reduces rendering time in offline rendering and brings ray tracing a step closer towards a feasible real-time approach. Bonsai, presented in the fourth paper, constructs BVHs on CPUs faster than contemporary competing algorithms and produces BVHs of a very high quality. Following Bonsai, the fifth paper presents an algorithm that refines BVH construction by allowing triangles to be split. Although splitting triangles increases construction time, it generally allows for higher quality BVHs. The fifth paper introduces a triangle splitting BVH construction approach that builds BVHs with quality on a par with an earlier high quality splitting algorithm. However, the method presented in paper five is several times faster in construction time
    corecore