44 research outputs found

    Optimized Local Path Planner Implementation for GPU-Accelerated Embedded Systems

    Get PDF
    Autonomous vehicles are latency-sensitive systems. The planning phase is a critical component of such systems, during which the in-vehicle compute platform is responsible for determining the future maneuvers that the vehicle will follow. In this paper, we present a GPU-accelerated optimized implementation of the Frenet Path Planner, a widely known path planning algorithm. Unlike the current state-of-the-art, our implementation accelerates the entire algorithm, including the path generation and collision avoidance phases. We measure the execution time of our implementation and demonstrate dramatic speedups compared to the CPU baseline implementation. Additionally, we evaluate the impact of different precision types (double, float, half) on trajectory errors to investigate the tradeoff between completion latencies and computation precision

    Contending memory in heterogeneous SoCs: Evolution in NVIDIA Tegra embedded platforms

    Get PDF
    Modern embedded platforms are known to be constrained by size, weight and power (SWaP) requirements. In such contexts, achieving the desired performance-per-watt target calls for increasing the number of processors rather than ramping up their voltage and frequency. Hence, generation after generation, modern heterogeneous System on Chips (SoC) present a higher number of cores within their CPU complexes as well as a wider variety of accelerators that leverages massively parallel compute architectures. Previous literature demonstrated that while increasing parallelism is theoretically optimal for improving on average performance, shared memory hierarchies (i.e. caches and system DRAM) act as a bottleneck by exposing the platform processors to severe contention on memory accesses, hence dramatically impacting performance and timing predictability. In this work we characterize how subsequent generations of embedded platforms from the NVIDIA Tegra family balanced the increasing parallelism of each platform's processors with the consequent higher potential on memory interference. We also present an open-source software for generating test scenarios aimed at measuring memory contention in highly heterogeneous SoCs

    The RSPO–LGR4/5–ZNRF3/RNF43 module controls liver zonation and size

    Get PDF
    LGR4/5 receptors and their cognate RSPO ligands potentiate Wnt/β-catenin signalling and promote proliferation and tissue homeostasis in epithelial stem cell compartments. In the liver, metabolic zonation requires a Wnt/β-catenin signalling gradient, but the instructive mechanism controlling its spatiotemporal regulation is not known. We have now identified the RSPO-LGR4/5-ZNRF3/RNF43 module as a master regulator of Wnt/β-catenin-mediated metabolic liver zonation. Liver-specific LGR4/5 loss of function (LOF) or RSPO blockade disrupted hepatic Wnt/β-catenin signalling and zonation. Conversely, pathway activation in ZNRF3/RNF43 LOF mice or with recombinant RSPO1 protein expanded the hepatic Wnt/β-catenin signalling gradient in a reversible and LGR4/5-dependent manner. Recombinant RSPO1 protein increased liver size and improved liver regeneration, whereas LGR4/5 LOF caused the opposite effects, resulting in hypoplastic livers. Furthermore, we show that LGR4(+) hepatocytes throughout the lobule contribute to liver homeostasis without zonal dominance. Taken together, our results indicate that the RSPO-LGR4/5-ZNRF3/RNF43 module controls metabolic liver zonation and is a hepatic growth/size rheostat during development, homeostasis and regeneration

    vkpolybench: A crossplatform Vulkan Compute port of the PolyBench/GPU benchmark suite

    Get PDF
    PolyBench is a well-known set of benchmarks characterized by embarrassingly parallel kernels able to run on Graphic Processing Units (GPUs). While Polybench GPU kernels leverage well-established GP-GPU APIs such as CUDA and OpenCL, in this paper we present vkpolybench, a crossplatform PolyBench/GPU port built on top of Vulkan. Vulkan is the recently released Khronos standard for heterogeneous CPU–GPU computing that is gaining significant traction lately. Compared to CUDA and OpenCL, the Vulkan API improves GPU utilization while reducing CPU overheads

    Optimization strategies for GPUs: an overview of architectural approaches

    No full text
    Modern Cyber Physical Systems (CPS) applications require hardware capable of optimized performance-per-watt efficency. This is usually obtained through massively parallel accelerators such as the GPU. Recent research is therefore investigating novel designs to optimize GPU energy consumption and performance for various applications in the Internet-of-things, autonomous navigation, and industrial robotics domains. This paper presents a survey of the current state-of-the-art approaches for optimizing GPU performance metrics; we present a complete and up-to-date summary of ideas, mechanisms, and potential improvements for next-generation GPU devices

    Building Time-Triggered Schedules for Typed-DAG Tasks with Alternative Implementations

    No full text
    Real-time and latency sensitive applications such as autonomous driving, feature an increasing need of computational power that traditional multi-core platforms can not provide. For this purpose, many heterogeneous embedded platforms have been released recently. They offer a set of diverse processing elements (e.g. GPUs, DSPs, ASICs, etc...) in order to manage the computational demands of data hungry applications. The system engineer, therefore, can choose the fittest processing element for each specific subtask. In this context, timing constraints and related task models are of paramount importance.The HPC-DAG (Heterogeneous Parallel Directed Acyclic Graph) task model has been recently proposed to capture realtime workload execution on modern heterogeneous platforms. It expresses the Instruction Set Architecture (ISA) heterogeneity across the different compute accelerators, but also their differences in terms of possible scheduling policies such as preemption.In this paper, we propose a time-table scheduling approach to allocate and schedule a set of HPC-DAG tasks onto a set of heterogeneous cores, by the mean of Integer Linear Programming (ILP). Our design allows the system engineer to handle heterogeneity of resources, of on-line execution costs, and of a part of the tasks and sub-tasks allocation to cores. It improves the solving time compared to the state of the art by gradually exploring the design space

    Exploiting Traffic Lights to Manage Auction-Based Crossings

    No full text
    Auction-based crossing management approaches are used to design coordination policies for autonomous vehicles and improve smart intersections by providing differentiated latencies. In this paper, we propose and exploit an auction based mechanism for managing the urban traffic light infrastructure in which participant vehicles are either equipped or non-equipped. The difference between these two categories of vehicles is that only the equipped ones can actively participate to auctions through in-vehicle IoT-devices, i.e. they are able to communicate with the surrounding urban infrastructure. In this way, we aim to study the transitional period that will occur before the complete adoption of autonomous or strongly connected vehicles. Through extensive experiments and simulations, by comparing our mechanism to the traditional traffic light fixed-time-control approach, we studied the benefits and limitations, in term of waiting and trip times, when varying the subset of equipped vehicles and the available budget that can be used to participate to auctions

    Coordinated traffic lights and auction intersection management in a mixed scenario

    No full text
    IoT (Internet-of-Things) powered devices can be exploited to connect vehicles to a smart city infrastructure and thus allow vehicles to share their intentions while retrieving contextual information about diverse aspects of urban viability. Such a complex system is aimed at improving our way of living in the city by mitigating the effect of traffic congestion, and consequently stress and pollution. We place ourselves in a transient scenario in which next generation vehicles that are able to communicate with the surrounding infrastructure coexist with traditional vehicles with limited or absent IoT-capabilities. In this work we focus on intersection management and, in particular, on reusing existing traffic lights empowered by a new management systems. We propose an auction based system in which traffic lights are able to exchange contextual information with vehicles and the nearby traffic lights with the aim of reducing average waiting times at intersections and consequently, overall trip times. We evaluate our proposal using the well known MATSim transport simulator, by using a synthetic Manhattan map and a new map we build on an urban area located in our town, in Norther Italy. In such an area, instrumentation through IoT devices has been set up as part of an European research project. Results show that the proposal is better performing than the classical Fixed Time Control system currently adopted for traffic lights, and then auction strategies that do not exploit coordination among nearby traffic lights
    corecore