3,533 research outputs found

    Parallelized Particle and Gaussian Sum Particle Filters for Large Scale Freeway Traffic Systems

    Get PDF
    Large scale traffic systems require techniques able to: 1) deal with high amounts of data and heterogenous data coming from different types of sensors, 2) provide robustness in the presence of sparse sensor data, 3) incorporate different models that can deal with various traffic regimes, 4) cope with multimodal conditional probability density functions for the states. Often centralized architectures face challenges due to high communication demands. This paper develops new estimation techniques able to cope with these problems of large traffic network systems. These are Parallelized Particle Filters (PPFs) and a Parallelized Gaussian Sum Particle Filter (PGSPF) that are suitable for on-line traffic management. We show how complex probability density functions of the high dimensional trafc state can be decomposed into functions with simpler forms and the whole estimation problem solved in an efcient way. The proposed approach is general, with limited interactions which reduces the computational time and provides high estimation accuracy. The efciency of the PPFs and PGSPFs is evaluated in terms of accuracy, complexity and communication demands and compared with the case where all processing is centralized

    A Comparative Analysis of STM Approaches to Reduction Operations in Irregular Applications

    Get PDF
    As a recently consolidated paradigm for optimistic concurrency in modern multicore architectures, Transactional Memory (TM) can help to the exploitation of parallelism in irregular applications when data dependence information is not available up to run- time. This paper presents and discusses how to leverage TM to exploit parallelism in an important class of irregular applications, the class that exhibits irregular reduction patterns. In order to test and compare our techniques with other solutions, they were implemented in a software TM system called ReduxSTM, that acts as a proof of concept. Basically, ReduxSTM combines two major ideas: a sequential-equivalent ordering of transaction commits that assures the correct result, and an extension of the underlying TM privatization mechanism to reduce unnecessary overhead due to reduction memory updates as well as unnecesary aborts and rollbacks. A comparative study of STM solutions, including ReduxSTM, and other more classical approaches to the parallelization of reduction operations is presented in terms of time, memory and overhead.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Benchmarking Memory Management Capabilities within ROOT-Sim

    Get PDF
    In parallel discrete event simulation techniques, the simulation model is partitioned into objects, concurrently executing events on different CPUs and/or multiple CPUCores. In such a context, run-time supports for logical time synchronization across the different simulation objects play a central role in determining the effectiveness of the specific parallel simulation environment. In this paper we present an experimental evaluation of the memory management capabilities offered by the ROme OpTimistic Simulator (ROOT-Sim). This is an open source parallel simulation environment transparently supporting optimistic synchronization via recoverability (based on incremental log/restore techniques) of any type of memory operation affecting the state of simulation objects, i.e., memory allocation, deallocation and update operations. The experimental study is based on a synthetic benchmark which mimics different read/write patterns inside the dynamic memory map associated with the state of simulation objects. This allows sensibility analysis of time and space effects due to the memory management subsystem while varying the type and the locality of the accesses associated with event processin

    Modelling Fluid Structure Interaction problems using Boundary Element Method

    Get PDF
    This dissertation investigates the application of Boundary Element Methods (BEM) to Fluid Structure Interaction (FSI) problems under three main different perspectives. This work is divided in three main parts: i) the derivation of BEM for the Laplace equation and its application to analyze ship-wave interaction problems, ii) the imple- mentation of efficient and parallel BEM solvers addressing the newest challenges of High Performance Computing, iii) the developing of a BEM for the Stokes system and its application to study micro-swimmers.First we develop a BEM for the Laplace equation and we apply it to predict ship-wave interactions making use of an innovative coupling with Finite Element Method stabilization techniques. As well known, the wave pattern around a body depends on the Froude number associated to the flow. Thus, we throughly investigate the robustness and accuracy of the developed methodology assessing the solution dependence on such parameter. To improve the performance and tackle problems with higher number of unknowns, the BEM developed for the Laplace equation is parallelized using OpenSOURCE tech- nique in a hybrid distributed-shared memory environment. We perform several tests to demonstrate both the accuracy and the performance of the parallel BEM developed. In addition, we explore two different possibilities to reduce the overall computational cost from O(N2) to O(N). Firstly we couple the library with a Fast Multiple Method that allows us to reach for higher order of complexity and efficiency. Then we perform a preliminary study on the implementation of a parallel Non Uniform Fast Fourier Transform to be coupled with the newly developed algorithm Sparse Cardinal Sine De- composition (SCSD).Finally we consider the application of the BEM framework to a different kind of FSI problem represented by the Stokes flow of a liquid medium surrounding swimming micro-organisms. We maintain the parallel structure derived for the Laplace equation even in the Stokes setting. Our implementation is able to simulate both prokaryotic and eukaryotic organisms, matching literature and experimental benchmarks. We finally present a deep analysis of the importance of hydrodynamic interactions between the different parts of micro-swimmers in the prevision of optimal swimming conditions, focusing our attention on the study of flagellated \u201crobotic\u201d composite swimmers

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

    Parallelization Strategies for Markerless Human Motion Capture

    Get PDF
    Markerless Motion Capture (MMOCAP) is the problem of determining the pose of a person from images captured by one or several cameras simultaneously without using markers on the subject. Evaluation of the solutions is frequently the most time-consuming task, making most of the proposed methods inapplicable in real-time scenarios. This paper presents an efficient approach to parallelize the evaluation of the solutions in CPUs and GPUs. Our proposal is experimentally compared on six sequences of the HumanEva-I dataset using the CMAES algorithm. Multiple algorithm’s configurations were tested to analyze the best trade-off in regard to the accuracy and computing time. The proposed methods obtain speedups of 8× in multi-core CPUs, 30× in a single GPU and up to 110× using 4 GPU

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin
    corecore