706 research outputs found

    Design and testing methodologies for signal processing systems using DICE

    Get PDF
    The design and integration of embedded systems in heterogeneous programming environments is still largely done in an ad hoc fashion making the overall development process more complicated, tedious and error-prone. In this work, we propose enhancements to existing design flows that utilize model-based design to verify cross-platform correctness of individual actors. The DSPCAD Integrative Command Line Environment (DICE) is a realization of managing these enhancements. We demonstrate this design flow with two case studies. By using DICE's novel test framework on modules of a triggering system in the Large Hadron Collider, we demonstrate how the cross-platform model-based approach, automatic testbench creation and integration of testing in the design process alleviate the rigors of developing such a complex digital system. The second case study is an exploration study into the required precision for eigenvalue decomposition using the Jacobi algorithm. This case study is a demonstration of the use of dataflow modeling in early stage application exploration and the use of DICE in the overall design flow

    Improving low latency applications for reconfigurable devices

    Get PDF
    This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed. Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods. Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture. Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces

    Novel VLSI architecture of motion estimation and compensation for H.264 standard

    Get PDF
    This thesis presents a high performance novel VLSI architecture of a H.264 motion estimator, which can be used as a building block for real-time H.264 video compression. Full-search block matching algorithm was used in this design. Pipeline structure was developed for variable block size processing units to work in parallel. The speed at 125MHz is good for real time motion estimation with 25/sec frame rate and 640x480 resolutions. The processing speed is also independent of the threshold level of Sum of Absolute Difference (SAD), which is used to determine the size of the macro block. The architecture is implemented with Register Transfer Level VHDL codes then synthesized with Synopsys Design Compiler, using TSMC 0.25um technology. The synthesized Application Specific Integrated Circuits (ASIC\u27s) has an area of 664um x 664um
    • …
    corecore