17,832 research outputs found
Resource Sharing in Custom Instruction Set Extensions
Customised processor performance generally increases as additional custom instructions are added. However, performance is not the only metric that modern systems must take into account; die area and energy efficiency are equally important. Resource sharing during synthesis of instruction set extensions (ISEs) can reduce significantly the die area and energy consumption of a customized processor. This may increase the number of custom instructions that can be synthesized with a given area budget. Resource sharing involves combining the graph representations of two or more ISEs which contain a similar sub-graph. This coupling of multiple sub-graphs, if performed naively, can increase the latency of the extension instructions considerably. And yet, as we show in this paper, an appropriate level of resource sharing provides a significantly simpler design with only modest increases in average latency for extension instructions. Based on existing resource-sharing techniques, this study presents a new heuristic that controls the degree of resource sharing between a given set of custom instructions. Our main contributions are the introduction of a parametric method for exploring the trade-offs that can be achieved between instruction latency and implementation complexity, and the coupling of design-space exploration with fast area-delay models for the operators comprising each ISE. We present experimental evidence that our heuristic exposes a broad range of design points, allowing advantageous trade-offs between die area and latency to be found and exploited
HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA
Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host
processor with programmable manycore accelerators (PMCAs) to combine
general-purpose computing with domain-specific, efficient processing
capabilities. While leading companies successfully advance their HESoC
products, research lags behind due to the challenges of building a prototyping
platform that unites an industry-standard host processor with an open research
PMCA architecture. In this work we introduce HERO, an FPGA-based research
platform that combines a PMCA composed of clusters of RISC-V cores, implemented
as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host
processor. The PMCA architecture mapped on the FPGA is silicon-proven,
scalable, configurable, and fully modifiable. HERO includes a complete software
stack that consists of a heterogeneous cross-compilation toolchain with support
for OpenMP accelerator programming, a Linux driver, and runtime libraries for
both host and PMCA. HERO is designed to facilitate rapid exploration on all
software and hardware layers: run-time behavior can be accurately analyzed by
tracing events, and modifications can be validated through fully automated hard
ware and software builds and executed tests. We demonstrate the usefulness of
HERO by means of case studies from our research
S-FaaS: Trustworthy and Accountable Function-as-a-Service using Intel SGX
Function-as-a-Service (FaaS) is a recent and already very popular paradigm in
cloud computing. The function provider need only specify the function to be
run, usually in a high-level language like JavaScript, and the service provider
orchestrates all the necessary infrastructure and software stacks. The function
provider is only billed for the actual computational resources used by the
function invocation. Compared to previous cloud paradigms, FaaS requires
significantly more fine-grained resource measurement mechanisms, e.g. to
measure compute time and memory usage of a single function invocation with
sub-second accuracy. Thanks to the short duration and stateless nature of
functions, and the availability of multiple open-source frameworks, FaaS
enables non-traditional service providers e.g. individuals or data centers with
spare capacity. However, this exacerbates the challenge of ensuring that
resource consumption is measured accurately and reported reliably. It also
raises the issues of ensuring computation is done correctly and minimizing the
amount of information leaked to service providers.
To address these challenges, we introduce S-FaaS, the first architecture and
implementation of FaaS to provide strong security and accountability guarantees
backed by Intel SGX. To match the dynamic event-driven nature of FaaS, our
design introduces a new key distribution enclave and a novel transitive
attestation protocol. A core contribution of S-FaaS is our set of resource
measurement mechanisms that securely measure compute time inside an enclave,
and actual memory allocations. We have integrated S-FaaS into the popular
OpenWhisk FaaS framework. We evaluate the security of our architecture, the
accuracy of our resource measurement mechanisms, and the performance of our
implementation, showing that our resource measurement mechanisms add less than
6.3% latency on standardized benchmarks
ASAM : Automatic Architecture Synthesis and Application Mapping; dl. 3.2: Instruction set synthesis
No abstract
An assessment of the connection machine
The CM-2 is an example of a connection machine. The strengths and problems of this implementation are considered as well as important issues in the architecture and programming environment of connection machines in general. These are contrasted to the same issues in Multiple Instruction/Multiple Data (MIMD) microprocessors and multicomputers
- …