785 research outputs found

    Parallel Hierarchical Radiosity on Hybrid Platforms

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s11227-011-0592-6[Abstract] Achieving an efficient realistic illumination is an important aim of research in computer graphics. In this paper a new parallel global illumination method for hybrid systems based on the hierarchical radiosity method is presented. Our solution allows the exploitation of systems that combine independent nodes with multiple cores per node. Thus, multiple nodes work in parallel in the computation of the global illumination for the same scene. Within each node, all the available computational cores are used through a shared-memory multithreading approach. The good results obtained in terms of speedup on several distributed-memory and shared-memory configurations show the versatility of our hybrid proposal.[Resumo] Acadar unha eficiente iluminación realista é un importante obxectivo no campo dos gráficos por computadora. Neste traballo preséntase un novo método de iluminación global paralelo para sistemas híbridos baseado no modelo de radiosidade jerárquica. A nosa solución permite a explotación de sistemas que combinen nodos de cómputo independentes con múltiples núcleos de execución en cada nodo. Deste xeito, varios nodos traballan en paralelo na computación da iluminación global dunha mesma escea. Dentro de cada nodo, todos os núcleos computacionais dispoñibles son aproveitados mediante unha aproximación multifío en memoria compartida. Os bos resultados obtidos en canto a aceleración en distintas configuracións de memoria compartida e distribuída dan mostra da versatilidade da nosa proposta híbrida.Xunta de Galicia; INCITE08PXIB105161PRMinisterio de Educación y Ciencia; MEC TIN 2010-16735Xunta de Galicia; 08TIC001206P

    Memory-aware list scheduling for hybrid platforms

    Get PDF
    This report provides memory-aware heuristics to schedule tasks graphs onto heterogeneous resources, such as a dual-memory cluster equipped with multicores and a dedicated accelerator (FPGA or GPU). Each task has a different processing time for either resource. The optimization objective is to schedule the graph so as to minimize execution time, given the available memory for each resource type. In addition to ordering the tasks, we must also decide on which resource to execute them, given their computation requirement and the memory currently available on each resource. The major contributions of this report are twofold: (i) the derivation of an intricate integer linear program formulation for this scheduling problem; and (ii) the design of memory-aware heuristics, which outperform the reference heuristics HEFT and MinMin on a wide variety of problem instances. The absolute performance of these heuristics is assessed for small-size graphs, with up to 30 tasks, thanks to the linear program

    Scheduling on Hybrid Platforms: Improved Approximability Window

    Full text link
    Modern platforms are using accelerators in conjunction with standard processing units in order to reduce the running time of specific operations, such as matrix operations, and improve their performance. Scheduling on such hybrid platforms is a challenging problem since the algorithms used for the case of homogeneous resources do not adapt well. In this paper we consider the problem of scheduling a set of tasks subject to precedence constraints on hybrid platforms, composed of two types of processing units. We propose a (3+22)(3+2\sqrt{2})-approximation algorithm and a conditional lower bound of 3 on the approximation ratio. These results improve upon the 6-approximation algorithm proposed by Kedad-Sidhoum et al. as well as the lower bound of 2 due to Svensson for identical machines. Our algorithm is inspired by the former one and distinguishes the allocation and the scheduling phases. However, we propose a different allocation procedure which, although is less efficient for the allocation sub-problem, leads to an improved approximation ratio for the whole scheduling problem. This approximation ratio actually decreases when the number of processing units of each type is close and matches the conditional lower bound when they are equal

    FireNN: Neural Networks Reliability Evaluation on Hybrid Platforms

    Get PDF
    The growth of neural networks complexity has led to adopt of hardware-accelerators to cope with the computational power required by the new architectures. The possibility to adapt the network for different platforms enhanced the interests of safety-critical applications. The reliability evaluation of neural networks are still premature and requires platforms to measure the safety standards required by mission-critical applications. For this reason, the interest in studying the reliability of neural networks is growing. We propose a new approach for evaluating the resiliency of neural networks by using hybrid platforms. The approach relies on the reconfigurable hardware for emulating the target hardware platform and performing the fault injection process. The main advantage of the proposed approach is to involve the on-hardware execution of the neural network in the reliability analysis without any intrusiveness into the network algorithm and addressing specific fault models. The implementation of FireNN, the platform based on the proposed approach, is described in the paper. Experimental analyses are performed using fault injection on AlexNet. The analyses are carried out using the FireNN platform and the results are compared with the outcome of traditional software-level evaluations. Results are discussed considering the insight into the hardware level achieved using FireNN

    Magnetic Particle-Based Hybrid Platforms for Bioanalytical Sensors

    Get PDF
    Biomagnetic nano and microparticles platforms have attracted considerable interest in the field of biological sensors due to their interesting physico-chemical properties, high specific surface area, good mechanical stability and opportunities for generating magneto-switchable devices. This review discusses recent advances in the development and characterization of active biomagnetic nanoassemblies, their interaction with biological molecules and their use in bioanalytical sensors

    Model and complexity results for tree traversals on hybrid platforms

    Get PDF
    International audienceWe study the complexity of traversing tree-shaped workflows whose tasks require large I/O files. We target a heterogeneous architec- ture with two resource of different types, where each resource has its own memory, such as a multicore node equipped with a dedicated accelera- tor (FPGA or GPU). Tasks in the workflow are tagged with the type of resource needed for their processing. Besides, a task can be processed on a given resource only if all its input files and output files can be stored in the corresponding memory. At a given execution step, the amount of data stored in each memory strongly depends upon the ordering in which the tasks are executed, and upon when communications between both memories are scheduled. The objective is to determine an efficient traver- sal that minimizes the maximum amount of memory of each type needed to traverse the whole tree. In this paper, we establish the complexity of this two-memory scheduling problem, provide inapproximability results, and show how to determine the optimal depth-first traversal. Altogether, these results lay the foundations for memory-aware scheduling algorithms on heterogeneous platforms

    Towards a Smart Selection of Hybrid Platforms for Multimedia Processing

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.Nowadays, images and videos have been present everywhere, they can come directly from camera, mobile devices or from other peoples that share their images and videos. The latter are used to illustrate different objects in a large number of situations. This makes from image and video processing algorithms a very important tool used for various domains related to computer vision such as video surveillance, medical imaging and database (images and videos) indexation methods. The performance of these algorithms have been so reduced due the the high intensive computation required when using new image and video standards. In this paper, we propose a new framework that allows users to select in a smart and efficient way the processing units (GPU or/and CPU) within heterogeneous systems, when treating different kinds of multimedia objects : single image, multiple images, multiple videos and video in real time. The framework disposes of different image and video primitive functions that are implemented on GPU, such as shape (silhouette) detection, motion tracking using optical flow estimation, edges and corners detection. We have exploited these functions for several situations such as indexing videos, segmenting vertebrae in in X-ray and MR images, detecting and localizing event in multi-user scenarios. Experimentation showed interesting accelerations ranging from 6 to 118, by comparison with sequential implementations. Moreover, the parallel and heterogeneous implementations offered lower power consumption as a result for the fast treatment.European Cooperation in Science and Technology. COS

    The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing

    Get PDF
    The risk of reinsurance portfolios covering globally occurring natural catastrophes, such as earthquakes and hurricanes, is quantified by employing simulations. These simulations are computationally intensive and require large amounts of data to be processed. The use of many-core hardware accelerators, such as the Intel Xeon Phi and the NVIDIA Graphics Processing Unit (GPU), are desirable for achieving high-performance risk analytics. In this paper, we set out to investigate how accelerators can be employed in risk analytics, focusing on developing parallel algorithms for Aggregate Risk Analysis, a simulation which computes the Probable Maximum Loss of a portfolio taking both primary and secondary uncertainties into account. The key result is that both hardware accelerators are useful in different contexts; without taking data transfer times into account the Phi had lowest execution times when used independently and the GPU along with a host in a hybrid platform yielded best performance.Comment: A modified version of this article is accepted to the Computers and Electrical Engineering Journal under the title - "The Hardware Accelerator Debate: A Financial Risk Case Study Using Many-Core Computing"; Blesson Varghese, "The Hardware Accelerator Debate: A Financial Risk Case Study Using Many-Core Computing," Computers and Electrical Engineering, 201
    corecore