2 research outputs found

    Synthesis of heterogeneous distributed architectures for memory-intensive applications

    No full text

    Synthesis of heterogeneous distributed architectures for memory-intensive applications

    No full text
    Abstract β€” Memory-intensive applications present unique challenges to an ASIC designer in terms of the choice of memory organization, memory size requirements, bandwidth and access latencies, etc. The high potential of single-chip distributed logicmemory architectures in addressing many of these issues has been recognized in general-purpose computing, and more recently in ASIC design. However, such architectures will be adopted widely by designers only when general techniques and tools for efficient high-level synthesis (HLS) of multi-partitioned ASICs become available. The techniques presented in this paper are motivated by the fact that many memoryintensive applications exhibit irregular array data access patterns (due to conditionals in loop nests, etc.). Synthesis should, therefore, be capable of determining a partitioned architecture, wherein array data and computations may have to be heterogeneously distributed for achieving the best performance speedup. Furthermore, the synthesis methodology should not be restricted by the nature of array index functions (affine or otherwise) in a behavior. Therefore, our methodology employs simulation to provide information about the access patterns of array data references in a behavior, which is used by the rest of our analysis. We use a combination of clustering and min-cut style partitioning techniques to partition the behavior into sub-behaviors while considering various factors including data access locality, balanced workloads, inter-partition communication, etc. Finally, we also employ an iterative improvement strategy to determine the best way of distributing array data into physical memory in each partition. Our experiments with several benchmark applications show that the proposed techniques can yield partitioned architectures that can achieve upto performance speed-up over conventional HLS solutions, while achieving upto performance speedup over the best homogeneous partitioning solution feasible. I
    corecore