An optimized predication execution for SIMD extensions

Abstract

Vector processing is a widely used technique to improve performance and energy efficiency in modern processors. Most of them rely on predication to support divergence control. However, performance and energy consumption in predicated instructions are usually independent on the number of true values in a mask. This means that the efficiency of the system becomes sub-optimal as vector length increases. In this work we propose the Optimized Predication Execution (OPE) technique. OPE delays the execution of sparse masked vector instructions sharing the same PC, extracts their active elements and creates a new dense instruction with a higher mask density. After executing such dense instruction, results are restored to the original sparse instructions. Our approach improves performance by up to 25% and reduces dynamic energy consumption by up to 43% on real applications with predication.This work has been partially supported by the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Spanish Government (contract TIN2015-65316-P). A. Barredo has been supported by the Spanish Government under Formación del Personal Investigador fellowship number BES-2017-080635. M. Moretó has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104.Peer ReviewedPostprint (author's final draft

    Similar works