893 research outputs found

    Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

    Full text link
    For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics. Motivated by applications in space and mobile robotics, we implement and evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.Comment: 8 pages, 7 figures, 2 table

    A unified approach for managing heterogeneous processing elements on FPGAs

    Get PDF
    FPGA designs do not typically include all available processing elements, e.g., LUTs, DSPs and embedded cores. Additional work is required to manage their different implementations and behaviour, which can unbalance parallel pipelines and complicate development. In this paper we introduce a novel management architecture to unify heterogeneous processing elements into compute pools. A pool formed of E processing elements, each implementing the same function, serves D parallel function calls. A call-and-response approach to computation allows for different processing element implementations, connections, latencies and non-deterministic behaviour. Our rotating scheduler automatically arbitrates access to processing elements, uses greatly simplified routing, and scales linearly with D parallel accesses to the compute pool. Processing elements can easily be added to improve performance, or removed to reduce resource use and routing, facilitating higher operating frequencies. Migrating to larger or smaller FPGAs thus comes at a known performance cost. We assess our framework with a range of neural network activation functions (ReLU, LReLU, ELU, GELU, sigmoid, swish, softplus and tanh) on the Xilinx Alveo U280

    Improving low latency applications for reconfigurable devices

    Get PDF
    This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed. Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture. Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces

    Networking Applications for Embedded Systems

    Get PDF

    Chapter Networking Applications for Embedded Systems

    Get PDF
    Embedded system

    Design of a High Capacity, Scalable, and Green Wireless Communication System Leveraging the Unlicensed Spectrum

    Get PDF
    The stunning demand for mobile wireless data that has been recently growing at an exponential rate requires a several fold increase in spectrum. The use of unlicensed spectrum is thus critically needed to aid the existing licensed spectrum to meet such a huge mobile wireless data traffic growth demand in a cost effective manner. The deployment of Long Term Evolution (LTE) in the unlicensed spectrum (LTE-U) has recently been gaining significant industry momentum. The lower transmit power regulation of the unlicensed spectrum makes LTE deployment in the unlicensed spectrum suitable only for a small cell. A small cell utilizing LTE-L (LTE in licensed spectrum), and LTE-U (LTE in unlicensed spectrum) will therefore significantly reduce the total cost of ownership (TCO) of a small cell, while providing the additional mobile wireless data offload capacity from Macro Cell to small cell in LTE Heterogeneous Networks (HetNet), to meet such an increase in wireless data demand. The U.S. 5 GHz Unlicensed National Information Infrastructure (U-NII) bands that are currently under consideration for LTE deployment in the unlicensed spectrum contain only a limited number of 20 MHZ channels. Thus in a dense multi-operator deployment scenario, one or more LTE-U small cells have to co-exist and share the same 20 MHz unlicensed channel with each other and with the incumbent Wi-Fi. This dissertation presents a proactive small cell interference mitigation strategy for improving the spectral efficiency of LTE networks in the unlicensed spectrum. It describes the scenario and demonstrate via simulation results, that in the absence of an explicit interference mitigation mechanism, there will be a significant degradation in the overall LTE-U system performance for LTE-U co-channel co-existence in countries such as U.S. that do not mandate Listen-Before-Talk (LBT) regulations. An unlicensed spectrum Inter Cell Interference Coordination (usICIC) mechanism is then presented as a time-domain multiplexing technique for interference mitigation for the sharing of an unlicensed channel by multi-operator LTE-U small cells. Through extensive simulation results, it is demonstrated that the proposed usICIC mechanism will result in 40% or more improvement in the overall LTE-U system performance (throughput) leading to increased wireless communication system capacity. The ever increasing demand for mobile wireless data is also resulting in a dramatic expansion of wireless network infrastructure by all service providers resulting in significant escalation in energy consumption by the wireless networks. This not only has an impact on the recurring operational expanse (OPEX) for the service providers, but importantly the resulting increase in greenhouse gas emission is not good for the environment. Energy efficiency has thus become one of the critical tenets in the design and deployment of Green wireless communication systems. Consequently the market trend for next-generation communication systems has been towards miniaturization to meet this stunning ever increasing demand for mobile wireless data, leading towards the need for scalable distributed and parallel processing system architecture that is energy efficient, and high capacity. Reducing cost and size while increasing capacity, ensuring scalability, and achieving energy efficiency requires several design paradigm shifts. This dissertation presents the design for a next generation wireless communication system that employs new energy efficient distributed and parallel processing system architecture to achieve these goals while leveraging the unlicensed spectrum to significantly increase (by a factor of two) the capacity of the wireless communication system. This design not only significantly reduces the upfront CAPEX, but also the recurring OPEX for the service providers to maintain their next generation wireless communication networks

    Implementation of the specification and schematics design for the Ethernet Fronthaul Module

    Get PDF
    Abstract. This Master’s Thesis covers theory and implementation of a device which is designed using a small base station as a reference. The theory chapter consists of the description and theory of a cloud radio access network architecture, a high data rate interface, an active antenna system and a designed device itself. This theory chapter is used to give reasons why the device is designed. The implementation chapter is divided into two chapters, which explains how the implementation specification is done and how the schematics were drawn. The schematics chapter covers the modifications, which are done to the hardware of the original small base station. Ethernet Fronthaul Moduulin implementointispesifikaation ja piirikaavion suunnittelu. Tiivistelmä. Tämä diplomityö käsittelee pienen tukiasemalaitteen pohjalta suunniteltavan laitteen, siihen liittyvän teorian sekä toteutuksen. Laitteeseen liittyvä teoria muodostuu neljästä kappaleesta, jotka käsittelevät cloud radio access network -arkkitehtuuria, nopean data määrän rajapintaa, aktiivi antenni systeemiä sekä itse suunnitellun laitteen teoriaa. Teorialla pyritään pohjustamaan syitä siihen, minkä vuoksi kyseinen laite on haluttu toteuttaa. Laitteen toteutusta käsittelevä kappale on jaettu kahteen osioon, joissa kuvataan implementointispesifikaation toteutus ja piirikaavioiden piirto. Piirikaavio kappaleessa käsitellään muutokset, jotka on tehty pohjana käytettävän tukiaseman laitteistolle

    The "MIND" Scalable PIM Architecture

    Get PDF
    MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture
    corecore