4 research outputs found

    Maximizing data reuse for minimizing memory space requirements and execution cycles

    Full text link

    Maximizing Data Reuse for Minimizing Memory Space Requirements and Execution Cycles ∗

    No full text
    Abstract — Embedded systems in the form of vehicles and mobile devices such as wireless phones, automatic banking machines and new multi-modal devices operate under tight memory and power constraints. Therefore, their performance demands must be balanced very well against their memory space requirements and power consumption. Automatic tools that can optimize for memory space utilization and performance are expected to be increasingly important in the future as increasingly larger portions of embedded designs are being implemented in software. In this paper, we describe a novel optimization framework that can be used in two different ways: (i) deciding a suitable on-chip memory capacity for a given code, and (ii) restructuring the application code to make better use of the available on-chip memory space. While prior proposals have addressed these two questions, the solutions proposed in this paper are very aggressive in extracting and exploiting all data reuse in the application code, restricted only by inherent data dependences. I

    MEMORY INTERFACE SYNTHESIS FOR FPGA-BASED COMPUTING

    Get PDF
    This dissertation describes a methodology for the generation of a custom memory interface and associated direct memory access (DMA) controller for FPGA-based kernels that have a regular access pattern. The interface provides explicit support for the following features: (1) memory latency hiding, (2) static access scheduling, and (3) data reuse. The target platform is a multi-FPGA platform, the Convey HC-1, which has an advanced memory system that presents the user logic with three critical design challenges: the memory system itself does not perform caching or prefetching, memory operations are arbitrarily reordered, and the memory performance depends on the access order provided by the user logic. The objective of the interface is to reconcile the three problems described above and maximize overall interface performance. This dissertation proposes three memory access orders, explores buffering and blocking techniques, and exploits data reuse for the synthesis of custom memory interfaces for specific types of kernels. We evaluate our techniques with two types of benchmark kernels: matrix-vector multiplication and 6- and 27-point stencil operations. Experimental results show the proposed memory interface designs that combine memory latency hiding, access scheduling and data reuse achieve an overall performance speedup of 1.6 for matrix-vector multiplication, 2.2 for a 6-point stencil, and 9.5 for a 27-point stencil as compared to using a naïve memory interface
    corecore