Search CORE

7,716 research outputs found

CIDPro: Custom Instructions for Dynamic Program Diversification

Author: Biswas Arnab Kumar
Fell Alexander
Lam Siew-Kei
Pham Thinh Hung
Veeranna Nandeesha
Publication venue
Publication date: 01/01/2018
Field of study

Timing side-channel attacks pose a major threat to embedded systems due to their ease of accessibility. We propose CIDPro, a framework that relies on dynamic program diversification to mitigate timing side-channel leakage. The proposed framework integrates the widely used LLVM compiler infrastructure and the increasingly popular RISC-V FPGA soft-processor. The compiler automatically generates custom instructions in the security critical segments of the program, and the instructions execute on the RISC-V custom co-processor to produce diversified timing characteristics on each execution instance. CIDPro has been implemented on the Zynq7000 XC7Z020 FPGA device to study the performance overhead and security tradeoffs. Experimental results show that our solution can achieve 80% and 86% timing side-channel capacity reduction for two benchmarks with an acceptable performance overhead compared to existing solutions. In addition, the proposed method incurs only a negligible hardware area overhead of 1% slices of the entire RISC-V system

arXiv.org e-Print Archive

Crossref

DR-NTU (Digital Repository of NTU)

Explore Bristol Research

Recommended from our members

Behavioral synthesis from VHDL using structured modeling

Author: Gajski Daniel D.
Lis Joseph S.
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

This dissertation describes work in behavioral synthesis involving the development of a VHDL Synthesis System VSS which accepts a VHDL behavioral input specification and performs technology independent synthesis to generate a circuit netlist of generic components. The VHDL language is used for input and output descriptions. An intermediate representation which incorporates signal typing and component attributes simplifies compilation and facilitates design optimization.A Structured Modeling methodology has been developed to suggest standard VHDL modeling practices for synthesis. Structured modeling provides recommendations for the use of available VHDL description styles so that optimal designs will be synthesized.A design composed of generic components is synthesized from the input description through a process of Graph Compilation, Graph Criticism, and Design Compilation. Experiments were performed to demonstrate the effects of different modeling styles on the quality of the design produced by VSS. Several alternative VHDL models were examined for each benchmark, illustrating the improvements in design quality achieved when Structured Modeling guidelines were followed

eScholarship - University of California

Mapping and Scheduling of Directed Acyclic Graphs on An FPFA Tile

Author: Guo Y.
Smit G.J.M.
Publication venue: STW Technology Foundation
Publication date: 01/01/2002
Field of study

An architecture for a hand-held multimedia device requires components that are energy-efficient, flexible, and provide high performance. In the CHAMELEON [4] project we develop a coarse grained reconfigurable device for DSP-like algorithms, the so-called Field Programmable Function Array (FPFA). The FPFA devices are reminiscent to FPGAs, but with a matrix of Processing Parts (PP) instead of CLBs. The design of the FPFA focuses on: (1) Keeping each PP small to maximize the number of PPs that can fit on a chip; (2) providing sufficient flexibility; (3) Low energy consumption; (4) Exploiting the maximum amount of parallelism; (5) A strong support tool for FPFA-based applications. The challenge in providing compiler support for the FPFA-based design stems from the flexibility of the FPFA structure. If we do not use the characteristics of the FPFA structure properly, the advantages of an FPFA may become its disadvantages. The GECKO1project focuses on this problem. In this paper, we present a mapping and scheduling scheme for applications running on one FPFA tile. Applications are written in C and C code is translated to a Directed Acyclic Graphs (DAG) [4]. This scheme can map a DAG directly onto the reconfigurable PPs of an FPFA tile. It tries to achieve low power consumption by exploiting locality of reference and high performance by exploiting maximum parallelism

University of Twente Research Information

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

Recommended from our members

EXTEND-L : an input language for extensible register transfer compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Publication venue: eScholarship, University of California
Publication date: 15/04/1988
Field of study

This report discusses the model and input language for EXTEND, a synthesis system that permits extensible register transfer synthesis. EXTEND-L fills the need for a language that bridges the gap between existing behavioral input descriptions, which are too abstract, and structural schematics, which cannot capture the high-level behavior. The report first discusses previous work in behavioral synthesis and summarizes the deficiencies of these behavioral specifications. The report then describes the proposed langauge in detail, and concludes with a few examples that show its utility

eScholarship - University of California

Recommended from our members

Silicon compilation

Author: Dutt Nikil D.
Gajski Daniel D.
Pangrle Barry M.
Publication venue: eScholarship, University of California
Publication date: 01/01/1987
Field of study

Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory

eScholarship - University of California

Recommended from our members

Exploiting iteration-level parallelism in dataflow programs

Author: Bic Lubomir
Nagel Mark
Roy John M.A.
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

The term "dataflow" generally encompasses three distinct aspects of computation - a data-driven model of computation, a functional/declarative programming language, and a special-purpose multiprocessor architecture. In this paper we decouple the language and architecture issues by demonstrating that declarative programming is a suitable vehicle for the programming of conventional distributed-memory multiprocessors.This is achieved by appling several transformations to the compiled declarative program to achieve iteration-level (rather than instruction-level) parallelism. The transformations first group individual instructions into sequential light-weight processes, and then insert primitives to: (1) cause array allocation to be distributed over multiple processors, (2) cause computation to follow the data distribution by inserting an index filtering mechanism into a given loop and spawning a copy of it on all PEs; the filter causes each instance of that loop to operate on a different subrange of the index variable.The underlying model of computation is a dataflow/von Neumann hybrid in that exection within a process is control-driven while the creation, blocking, and activation of processes is data-driven.The performance of this process-oriented dataflow system (PODS) is demonstrated using the hydrodynamics simulation benchmark called SIMPLE, where a 19-fold speedup on a 32-processor architecture has been achieved

eScholarship - University of California

Content addressable memory project

Author: Hall J. Storrs
Levy Saul
Miyake Keith M.
Smith Donald E.
Publication venue
Publication date
Field of study

A parameterized version of the tree processor was designed and tested (by simulation). The leaf processor design is 90 percent complete. We expect to complete and test a combination of tree and leaf cell designs in the next period. Work is proceeding on algorithms for the computer aided manufacturing (CAM), and once the design is complete we will begin simulating algorithms for large problems. The following topics are covered: (1) the practical implementation of content addressable memory; (2) design of a LEAF cell for the Rutgers CAM architecture; (3) a circuit design tool user's manual; and (4) design and analysis of efficient hierarchical interconnection networks

NASA Technical Reports Server