117,290 research outputs found
Fast Recompilation of Object Oriented Modules
Once a program file is modified, the recompilation time should be minimized,
without sacrificing execution speed or high level object oriented features. The
recompilation time is often a problem for the large graphical interactive
distributed applications tackled by modern OO languages. A compilation server
and fast code generator were developed and integrated with the SRC Modula-3
compiler and Linux ELF dynamic linker. The resulting compilation and
recompilation speedups are impressive. The impact of different language
features, processor speed, and application size are discussed
GCC-Plugin for Automated Accelerator Generation and Integration on Hybrid FPGA-SoCs
In recent years, architectures combining a reconfigurable fabric and a
general purpose processor on a single chip became increasingly popular. Such
hybrid architectures allow extending embedded software with application
specific hardware accelerators to improve performance and/or energy efficiency.
Aiding system designers and programmers at handling the complexity of the
required process of hardware/software (HW/SW) partitioning is an important
issue. Current methods are often restricted, either to bare-metal systems, to
subsets of mainstream programming languages, or require special coding
guidelines, e.g., via annotations. These restrictions still represent a high
entry barrier for the wider community of programmers that new hybrid
architectures are intended for. In this paper we revisit HW/SW partitioning and
present a seamless programming flow for unrestricted, legacy C code. It
consists of a retargetable GCC plugin that automatically identifies code
sections for hardware acceleration and generates code accordingly. The proposed
workflow was evaluated on the Xilinx Zynq platform using unmodified code from
an embedded benchmark suite.Comment: Presented at Second International Workshop on FPGAs for Software
Programmers (FSP 2015) (arXiv:1508.06320
Investigating SRAM PUFs in large CPUs and GPUs
Physically unclonable functions (PUFs) provide data that can be used for
cryptographic purposes: on the one hand randomness for the initialization of
random-number generators; on the other hand individual fingerprints for unique
identification of specific hardware components. However, today's off-the-shelf
personal computers advertise randomness and individual fingerprints only in the
form of additional or dedicated hardware.
This paper introduces a new set of tools to investigate whether intrinsic
PUFs can be found in PC components that are not advertised as containing PUFs.
In particular, this paper investigates AMD64 CPU registers as potential PUF
sources in the operating-system kernel, the bootloader, and the system BIOS;
investigates the CPU cache in the early boot stages; and investigates shared
memory on Nvidia GPUs. This investigation found non-random non-fingerprinting
behavior in several components but revealed usable PUFs in Nvidia GPUs.Comment: 25 pages, 6 figures. Code in appendi
Virtual cluster scheduling through the scheduling graph
This paper presents an instruction scheduling and cluster assignment approach for clustered processors. The proposed technique makes use of a novel representation named the scheduling graph which describes all possible schedules. A powerful deduction process is applied to this graph, reducing at each step the set of possible schedules. In contrast to traditional list scheduling techniques, the proposed scheme tries to establish relations among instructions rather than assigning each instruction to a particular cycle. The main advantage is that wrong or poor schedules can be anticipated and discarded earlier. In addition, cluster assignment of instructions is performed using another novel concept called virtual clusters, which define sets of instructions that must execute in the same cluster. These clusters are managed during the deduction process to identify incompatibilities among instructions. The mapping of virtual to physical clusters is postponed until the scheduling of the instructions has finalized. The advantages this novel approach features include: (1) accurate scheduling information when assigning, and, (2) accurate information of the cluster assignment constraints imposed by scheduling decisions. We have implemented and evaluated the proposed scheme with superblocks extracted from Speclnt95 and MediaBench. The results show that this approach produces better schedules than the previous state-of-the-art. Speed-ups are up to 15%, with average speed-ups ranging from 2.5% (2-Clusters) to 9.5% (4-Clusters).Peer ReviewedPostprint (published version
- …