19 research outputs found
Customising compilers for customisable processors
The automatic generation of instruction set extensions to provide application-specific acceleration
for embedded processors has been a productive area of research in recent years. There
have been incremental improvements in the quality of the algorithms that discover and select
which instructions to add to a processor. The use of automatic algorithms, however, result in
instructions which are radically different from those found in conventional, human-designed,
RISC or CISC ISAs. This has resulted in a gap between the hardware’s capabilities and the
compiler’s ability to exploit them.
This thesis proposes and investigates the use of a high-level compiler pass that uses graph-subgraph
isomorphism checking to exploit these complex instructions. Operating in a separate
pass permits techniques to be applied that are uniquely suited for mapping complex instructions,
but unsuitable for conventional instruction selection. The existing, mature, compiler
back-end can then handle the remainder of the compilation. With this method, the high-level
pass was able to use 1965 different automatically produced instructions to obtain an initial average
speed-up of 1.11x over 179 benchmarks evaluated on a hardware-verified cycle-accurate
simulator.
This result was improved following an investigation of how the produced instructions were
being used by the compiler. It was established that the models the automatic tools were using to
develop instructions did not take account of how well the compiler could realistically use them.
Adding additional parameters to the search heuristic to account for compiler issues increased
the speed-up from 1.11x to 1.24x. An alternative approach using a re-designed hardware interface
was also investigated and this achieved a speed-up of 1.26x while reducing hardware and
compiler complexity.
A complementary, high-level, method of exploiting dual memory banks was created to increase
memory bandwidth to accommodate the increased data-processing bandwidth provided
by extension instructions. Finally, the compiler was considered for use in a non-conventional
role where rather than generating code it is used to apply source-level transformations prior to
the generation of extension instructions and thus affect the shape of the instructions that are
generated
OpenISA, um conjunto de instruções híbrido
Orientador: Edson BorinTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: OpenISA é concebido como a interface de processadores que pretendem ser altamente flexíveis. Isto é conseguido por meio de três estratégias: em primeiro lugar, o ISA é empiricamente escolhido para ser facilmente traduzido para outros, possibilitando flexibilidade do software no caso de um processador OpenISA físico não estar disponível. Neste caso, não há nenhuma necessidade de aplicar um processador virtual OpenISA em software. O ISA está preparado para ser estaticamente traduzido para outros ISAs. Segundo, o ISA não é um ISA concreto nem um ISA virtual, mas um híbrido com a capacidade de admitir modificações nos opcodes sem afetar a compatibilidade retroativa. Este mecanismo permite que as futuras versões do ISA possam sofrer modificações em vez de extensões simples das versões anteriores, um problema comum com ISA concretos, como o x86. Em terceiro lugar, a utilização de uma licença permissiva permite o ISA ser usado livremente por qualquer parte interessada no projeto. Nesta tese de doutorado, concentramo-nos nas instruções de nível de usuário do OpenISA. A tese discute (1) alternativas para ISAs, alternativas para distribuição de programas e o impacto de cada opção, (2) características importantes de OpenISA para atingir seus objetivos e (3) fornece uma completa avaliação do ISA escolhido com respeito a emulação de desempenho em duas CPUs populares, uma projetada pela Intel e outra pela ARM. Concluímos que a versão do OpenISA apresentada aqui pode preservar desempenho próximo do nativo quando traduzida para outros hospedeiros, funcionando como um modelo promissor para ISAs flexíveis da próxima geração que podem ser facilmente estendidos preservando a compatibilidade. Ainda, também mostramos como isso pode ser usado como um formato de distribuição de programas no nível de usuárioAbstract: OpenISA is designed as the interface of processors that aim to be highly flexible. This is achieved by means of three strategies: first, the ISA is empirically chosen to be easily translated to others, providing software flexibility in case a physical OpenISA processor is not available. Second, the ISA is not a concrete ISA nor a virtual ISA, but a hybrid one with the capability of admitting modifications to opcodes without impacting backwards compatibility. This mechanism allows future versions of the ISA to have real changes instead of simple extensions of previous versions, a common problem with concrete ISAs such as the x86. Third, the use of a permissive license allows the ISA to be freely used by any party interested in the project. In this PhD. thesis, we focus on the user-level instructions of OpenISA. The thesis discusses (1) ISA alternatives, program distribution alternatives and the impact of each choice, (2) important features of OpenISA to achieve its goals and (3) provides a thorough evaluation of the chosen ISA with respect to emulation performance on two popular host CPUs, one from Intel and another from ARM. We conclude that the version of OpenISA presented here can preserve close-to-native performance when translated to other hosts, working as a promising model for next-generation, flexible ISAs that can be easily extended while preserving backwards compatibility. Furthermore, we show how this can also be a program distribution format at user-levelDoutoradoCiência da ComputaçãoDoutor em Ciência da Computação2011/09630-1FAPES
Adding state to declarative languages to enable web applications
On the web, media tend to be encoded in declarative formats, which facilitate accessibility, reuse, and transformation. Web applications, on the other hand, are created with more procedural technology and do not enjoy these benefits. In this thesis we examine how this can be fixed. We examine a small part of the problem space, adaptive time based applications, and investigate how we can extend existing declarative languages to fa
Customizing the Computation Capabilities of Microprocessors.
Designers of microprocessor-based systems must constantly improve
performance and increase computational efficiency in their designs to
create value. To this end, it is increasingly common to see
computation accelerators in general-purpose processor
designs. Computation accelerators collapse portions of an
application's dataflow graph, reducing the critical path of
computations, easing the burden on processor resources, and reducing
energy consumption in systems. There are many problems associated with
adding accelerators to microprocessors, though. Design of
accelerators, architectural integration, and software support all
present major challenges.
This dissertation tackles these challenges in the context of
accelerators targeting acyclic and cyclic patterns of
computation. First, a technique to identify critical computation
subgraphs within an application set is presented. This technique is
hardware-cognizant and effectively generates a set of instruction set
extensions given a domain of target applications. Next, several
general-purpose accelerator structures are quantitatively designed
using critical subgraph analysis for a broad application set.
The next challenge is architectural integration of
accelerators. Traditionally, software invokes accelerators by
statically encoding new instructions into the application binary. This
is incredibly costly, though, requiring many portions of hardware and
software to be redesigned. This dissertation develops strategies to
utilize accelerators, without changing the instruction set. In the
proposed approach, the microarchitecture translates applications at
run-time, replacing computation subgraphs with microcode to utilize
accelerators. We explore the tradeoffs in performing difficult aspects
of the translation at compile-time, while retaining run-time
replacement. This culminates in a simple microarchitectural interface
that supports a plug-and-play model for integrating accelerators into
a pre-designed microprocessor.
Software support is the last challenge in dealing with computation
accelerators. The primary issue is difficulty in generating
high-quality code utilizing accelerators. Hand-written assembly code
is standard in industry, and if compiler support does exist, simple
greedy algorithms are common. In this work, we investigate more
thorough techniques for compiling for computation accelerators. Where
greedy heuristics only explore one possible solution, the techniques
in this dissertation explore the entire design space, when
possible. Intelligent pruning methods ensure that compilation is both
tractable and scalable.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/57633/2/ntclark_1.pd