2 research outputs found

    Improving Compute & Data Efficiency of Flexible Architectures

    Get PDF

    Reduction operator for wide-SIMDs reconsidered

    Get PDF
    It has been shown that wide Single Instruction Multiple Data architectures (wide-SIMDs) can achieve high energy efficiency, especially in domains such as image and vision processing. In these and various other application domains, reduction is a frequently encountered operation, where multiple input elements need to be combined into a single element by an associative operation, e.g. addition or multiplication. There are many applications that require reduction such as: partial histogram merging, matrix multiplication and min/max-finding. Wide-SIMDs contain a large number of processing elements (PEs), which in general are connected by a minimal form of interconnect for scalability reasons. To efficiently support reduction operations on wide-SIMDs with such a minimal interconnect, we introduce two novel reduction algorithms which do not rely on complex communication networks or any dedicated hardware. The proposed approaches are compared with both dedicated hardware and other software solutions in terms of performance, area, and energy consumption. A practical case study demonstrates that the proposed software approach has much better generality, flexibility and no additional hardware cost. Compared to a dedicated hardware adder tree, the proposed software approach saves 6.8% area with a performance penalty of only 6.5%
    corecore