9 research outputs found
Algorithms for Improving the Automatically Synthesized Instruction Set of an Extensible Processor
Processors with extensible instruction sets are often used today as
programmable hardware accelerators for various domains. When extending RISC-V
and other similar extensible processor architectures, the task of designing
specialized instructions arises. This task can be solved automatically by using
instruction synthesis algorithms. In this paper, we consider algorithms that
can be used in addition to the known approaches and improve the synthesized
instruction sets by recomputing common operations (the result of which is
consumed by multiple operations) of a program inside clustered synthesized
instructions (common operations clustering algorithm), and by identifying
redundant (which have equivalents among the other instructions) synthesized
instructions (subsuming functions algorithm).
Experimental evaluations of the developed algorithms are presented for the
tests from the domains of cryptography and three-dimensional graphics. For
Magma cipher test, the common operations clustering algorithm allows reducing
the size of the compiled code by 9%, and the subsuming functions algorithm
allows reducing the synthesized instruction set extension size by 2 times. For
AES cipher test, the common operations clustering algorithm allows reducing the
size of the compiled code by 10%, and the subsuming functions algorithm allows
reducing the synthesized instruction set extension size by 2.5 times. Finally,
for the instruction set extension from Volume Ray-Casting test, the additional
use of subsuming functions algorithm allows reducing problem-specific
instruction extension set size from 5 to only 2 instructions without losing its
functionality
Metodología para la generación y evaluación automática de hardware específico
En el área de la bioinformática podemos
encontrar aplicaciones que suponen un reto para el diseño de nuevas arquitecturas de procesadores en términos de rendimiento, ya que sus características difieren de las de las aplicaciones de propósito general. Por ello proponemos una nueva arquitectura con
unidades funcionales reconfigurables para un dominio específico de aplicaciones. Así, el primer paso para definir la nueva arquitectura será la creación de la nueva ISA del procesador, que se compondrá de extensiones de la ISA original. Para conseguir dicho objetivo, presentamos una metodología para identificar automáticamente patrones de instrucciones y generar prototipos de las unidades funcionales que las ejecutan. Hemos implementado la metodología de manera experimental con el soporte de la infraestructura Trimaran para la identificación de extensiones de la ISA,
la herramienta DWARV para la generación de código VHDL, y la plataforma MOLEN para la evaluación de los prototipos hardware específicos generados automáticamente. En las evaluaciones iniciales de los prototipos generados para una aplicación de estudio,
ClustalW, se ha obtenido hasta un 8.54x de speed-up para un único acelerador, mientras que el speed-up de toda la aplicación está por encima de 2x.Postprint (published version
On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors
Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads
ASAM : Automatic Architecture Synthesis and Application Mapping; dl. 3.2: Instruction set synthesis
No abstract
A DAG-Based Design Approach for Reconfigurable VLIW Processors
Embedded system design is a complex activity requiring to tradeoff a number of parameters such as cost, performance, flexibility, power and time-to-market. This paper explores the possibility of enabling a partial customisability of the Instruction Set of VLIW processors for embedded applications, by exploiting FPGA technology. In particular it is presented a formal methodology to select the application critical parts, whose RFUs (Reconfigurable Functional Units) implementation allows to reduce the overall execution time. Experiments conducted on representative benchmarks show the viability of the proposed approach.