6 research outputs found

    An integrated vector-scalar design on an in-order ARM core

    Get PDF
    In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance; however, adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We implemented a classic vector unit and compare its results against our integrated design. Our integrated design improves the performance (more than 6×) and energy consumption (up to 5×) of a scalar in-order core with negligible area overhead (only 4.7% when using a vector register with 32 elements). In contrast, the area overhead of the classic vector unit can be significant (around 44%) if a dedicated vector floating-point unit is incorporated. Our block-based vector execution outperforms the classic vector unit for all kernels with floating-point data and also consumes less energy. We also complement the integrated design with three energy/performance-efficient techniques that further reduce power and increase performance. The first proposal covers the design and implementation of chaining logic that is optimized to work with the cache hierarchy through vector memory instructions, the second proposal reduces the number of reads/writes from/to the vector register file, and the third idea optimizes complex memory access patterns with the memory shape instruction and unified indexed vector load.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014). O. Palomar is funded by a Royal Society Newton International Fellowship.Peer ReviewedPostprint (author's final draft

    POSTER: An Integrated Vector-Scalar Design on an In-order ARM Core

    Get PDF
    In the low-end mobile processor market, power, energy and area budgets are significantly lower than in other markets (e.g. servers or high-end mobile markets). It has been shown that vector processors are a highly energy-efficient way to increase performance; however adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014).Peer ReviewedPostprint (author's final draft

    Introduction to SVE Architecture evaluation in gem5

    Get PDF
    REUMEN: La arquitectura SVE, del inglés Scalable Vector Extension, es una extensión de la ISA ARM para el procesamiento vectorial que permite escalar el tamaño de los registros vectoriales con flexibilidad. El simulador gem5 posibilita el modelado arquitectónico de computadores mediante la simulación de diferentes configuraciones en diversas ISAs entre las que se encuentra ARM. En este Trabajo Fin de Grado se ha realizado una introducción a la evaluación de la arquitectura SVE en gem5. Para ello, se ha realizado una descripción minuciosa de la metodología necesaria para la realización de simulaciones Full-System en el entorno gem5, con las herramientas desarrolladas por el Grupo de Arquitectura y Tecnología de computadores (ATC) de la Universidad de Cantabria. Estas simulaciones permiten la evaluación del rendimiento de diferentes benchmarks tras el escalado de la longitud de vector de SVE y el número de cores. Para dicha evaluación, se han desarrollado dos benchmarks; matrix, que realiza la multiplicación de dos matrices, y gauss, que propone la aplicación de un filtrado gaussiano a una matriz de píxeles. Los resultados preliminares obtenidos en el proceso referentes al código vectorizado reflejan, por lo general, un mejor rendimiento al escalar el tamaño de vector antes que el número de cores.ABSTRACT: The Scalable Vector Extension (SVE) is an ARM ISA architecture extension for vectorization that supports flexible vector length scaling. The gem5 simulator allows computer architectural modelling by simulating different configurations on various ISAs including ARM. In this Final Degree Project, the work to evaluate the SVE architecture in gem5 has been introduced. For this purpose, a detailed description of the necessary methodology to carry out Full-System simulations in the gem5 environment, using the tools developed by the Computer Architecture and Technology Group (ATC) of the University of Cantabria, has been provided. These simulations allow the evaluation of the performance of different benchmarks after scaling both SVE vector length and number of cores. Two benchmarks have been developed for such evaluation; matrix, which performs the multiplication of two matrices, and gauss, which applies a Gaussian filter to a pixel matrix. The preliminary results obtained through the process concerning the vectorized code generally provide better performance when scaling the vector length rather than the number of cores.Grado en Ingeniería Informátic

    An integrated vector-scalar design on an in-order ARM core

    No full text
    In the low-end mobile processor market, power, energy, and area budgets are significantly lower than in the server/desktop/laptop/high-end mobile markets. It has been shown that vector processors are a highly energy-efficient way to increase performance; however, adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner. We implemented a classic vector unit and compare its results against our integrated design. Our integrated design improves the performance (more than 6×) and energy consumption (up to 5×) of a scalar in-order core with negligible area overhead (only 4.7% when using a vector register with 32 elements). In contrast, the area overhead of the classic vector unit can be significant (around 44%) if a dedicated vector floating-point unit is incorporated. Our block-based vector execution outperforms the classic vector unit for all kernels with floating-point data and also consumes less energy. We also complement the integrated design with three energy/performance-efficient techniques that further reduce power and increase performance. The first proposal covers the design and implementation of chaining logic that is optimized to work with the cache hierarchy through vector memory instructions, the second proposal reduces the number of reads/writes from/to the vector register file, and the third idea optimizes complex memory access patterns with the memory shape instruction and unified indexed vector load.The research leading to these results has received funding from the RoMoL ERC Advanced Grant GA no 321253 and is supported in part by the European Union (FEDER funds) under contract TIN2015-65316-P. This research has been also supported the Agency for Management of University and Research Grants (AGAUR - FI-DGR 2014). O. Palomar is funded by a Royal Society Newton International Fellowship.Peer Reviewe

    POSTER: an integrated vector-scalar design on an in-order ARM core

    No full text
    In the low-end mobile processor market, power, energy and area budgets are significantly lower than in other markets (e.g. servers or high-end mobile markets). It has been shown that vector processors are a highly energyefficient way to increase performance; however adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner.Peer ReviewedPostprint (published version

    POSTER: an integrated vector-scalar design on an in-order ARM core

    No full text
    In the low-end mobile processor market, power, energy and area budgets are significantly lower than in other markets (e.g. servers or high-end mobile markets). It has been shown that vector processors are a highly energyefficient way to increase performance; however adding support for them incurs area and power overheads that would not be acceptable for low-end mobile processors. In this work, we propose an integrated vector-scalar design for the ARM architecture that mostly reuses scalar hardware to support the execution of vector instructions. The key element of the design is our proposed block-based model of execution that groups vector computational instructions together to execute them in a coordinated manner.Peer Reviewe
    corecore