1,090 research outputs found
Hardware-based Security for Virtual Trusted Platform Modules
Virtual Trusted Platform modules (TPMs) were proposed as a software-based
alternative to the hardware-based TPMs to allow the use of their cryptographic
functionalities in scenarios where multiple TPMs are required in a single
platform, such as in virtualized environments. However, virtualizing TPMs,
especially virutalizing the Platform Configuration Registers (PCRs), strikes
against one of the core principles of Trusted Computing, namely the need for a
hardware-based root of trust. In this paper we show how strength of
hardware-based security can be gained in virtual PCRs by binding them to their
corresponding hardware PCRs. We propose two approaches for such a binding. For
this purpose, the first variant uses binary hash trees, whereas the other
variant uses incremental hashing. In addition, we present an FPGA-based
implementation of both variants and evaluate their performance
A Switch Architecture for Real-Time Multimedia Communications
In this paper we present a switch that can be used to transfer multimedia type of trafJic. The switch provides a guaranteed throughput and a bounded latency. We focus on the design of a prototype Switching Element using the new technology opportunities being offered today. The architecture meets the multimedia requirements but still has a low complexity and needs a minimum amount of hardware. A main item of this paper will be the background of the architectural design decisions made. These include the interconnection topology, buffer organization, routing and scheduling. The implementation of the switching fabric with FPGAs, allows us to experiment with switching mode, routing strategy and scheduling policy in a multimedia environment. The witching elements are interconnected in a Kautz topology. Kautz graphs have interesting properties such as: a small diametec the degree is independent of the network size, the network is fault-tolerant and has a simple routing algorithm
Delayed Dynamical Systems: Networks, Chimeras and Reservoir Computing
We present a systematic approach to reveal the correspondence between time
delay dynamics and networks of coupled oscillators. After early demonstrations
of the usefulness of spatio-temporal representations of time-delay system
dynamics, extensive research on optoelectronic feedback loops has revealed
their immense potential for realizing complex system dynamics such as chimeras
in rings of coupled oscillators and applications to reservoir computing.
Delayed dynamical systems have been enriched in recent years through the
application of digital signal processing techniques. Very recently, we have
showed that one can significantly extend the capabilities and implement
networks with arbitrary topologies through the use of field programmable gate
arrays (FPGAs). This architecture allows the design of appropriate filters and
multiple time delays which greatly extend the possibilities for exploring
synchronization patterns in arbitrary topological networks. This has enabled us
to explore complex dynamics on networks with nodes that can be perfectly
identical, introduce parameter heterogeneities and multiple time delays, as
well as change network topologies to control the formation and evolution of
patterns of synchrony
FPGA based remote code integrity verification of programs in distributed embedded systems
The explosive growth of networked embedded systems has made ubiquitous and pervasive computing a reality. However, there are still a number of new challenges to its widespread adoption that include scalability, availability, and, especially, security of software. Among the different challenges in software security, the problem of remote-code integrity verification is still waiting for efficient solutions. This paper proposes the use of reconfigurable computing to build a consistent architecture for generation of attestations (proofs) of code integrity for an executing program as well as to deliver them to the designated verification entity. Remote dynamic update of reconfigurable devices is also exploited to increase the complexity of mounting attacks in a real-word environment. The proposed solution perfectly fits embedded devices that are nowadays commonly equipped with reconfigurable hardware components that are exploited to solve different computational problems
Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators
Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 ”s, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device
Parallel Tempering Simulation of the three-dimensional Edwards-Anderson Model with Compact Asynchronous Multispin Coding on GPU
Monte Carlo simulations of the Ising model play an important role in the
field of computational statistical physics, and they have revealed many
properties of the model over the past few decades. However, the effect of
frustration due to random disorder, in particular the possible spin glass
phase, remains a crucial but poorly understood problem. One of the obstacles in
the Monte Carlo simulation of random frustrated systems is their long
relaxation time making an efficient parallel implementation on state-of-the-art
computation platforms highly desirable. The Graphics Processing Unit (GPU) is
such a platform that provides an opportunity to significantly enhance the
computational performance and thus gain new insight into this problem. In this
paper, we present optimization and tuning approaches for the CUDA
implementation of the spin glass simulation on GPUs. We discuss the integration
of various design alternatives, such as GPU kernel construction with minimal
communication, memory tiling, and look-up tables. We present a binary data
format, Compact Asynchronous Multispin Coding (CAMSC), which provides an
additional speedup compared with the traditionally used Asynchronous
Multispin Coding (AMSC). Our overall design sustains a performance of 33.5
picoseconds per spin flip attempt for simulating the three-dimensional
Edwards-Anderson model with parallel tempering, which significantly improves
the performance over existing GPU implementations.Comment: 15 pages, 18 figure
Improve the Usability of Polar Codes: Code Construction, Performance Enhancement and Configurable Hardware
Error-correcting codes (ECC) have been widely used for forward error correction (FEC) in modern communication systems to dramatically reduce the signal-to-noise ratio (SNR) needed to achieve a given bit error rate (BER). Newly invented polar codes have attracted much interest because of their capacity-achieving potential, efficient encoder and decoder implementation, and flexible architecture design space.This dissertation is aimed at improving the usability of polar codes by providing a practical code design method, new approaches to improve the performance of polar code, and a configurable hardware design that adapts to various specifications.
State-of-the-art polar codes are used to achieve extremely low error rates. In this work, high-performance FPGA is used in prototyping polar decoders to catch rare-case errors for error-correcting performance verification and error analysis. To discover the polarization characteristics and error patterns of polar codes, an FPGA emulation platform for belief-propagation (BP) decoding is built by a semi-automated construction flow. The FPGA-based emulation achieves significant speedup in large-scale experiments involving trillions of data frames. The platform is a key enabler of this work.
The frozen set selection of polar codes, known as bit selection, is critical to the error-correcting performance of polar codes. A simulation-based in-order bit selection method is developed to evaluate the error rate of each bit using Monte Carlo simulations. The frozen set is selected based on the bit reliability ranking. The resulting code construction exhibits up to 1 dB coding gain with respect to the conventional bit selection.
To further improve the coding gain of BP decoder for low-error-rate applications, the decoding error mechanisms are studied and analyzed, and the errors are classified based on their distinct signatures. Error detection is enabled by low-cost CRC concatenation, and post-processing algorithms targeting at each type of the error is designed to mitigate the vast majority of the decoding errors. The post-processor incurs only a small implementation overhead, but it provides more than an order of magnitude improvement of the error-correcting performance.
The regularity of the BP decoder structure offers many hardware architecture choices. Silicon area, power consumption, throughput and latency can be traded to reach the optimal design points for practical use cases. A comprehensive design space exploration reveals several practical architectures at different design points. The scalability of each architecture is also evaluated based on the implementation candidates.
For dynamic communication channels, such as wireless channels in the upcoming 5G applications, multiple codes of different lengths and code rates are needed to t varying channel conditions. To minimize implementation cost, a universal decoder architecture is proposed to support multiple codes through hardware reuse. A 40nm length- and rate-configurable polar decoder ASIC is demonstrated to fit various
communication environments and service requirements.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140817/1/shuangsh_1.pd
Qubit Data Structures for Analyzing Computing Systems
Qubit models and methods for improving the performance of software and
hardware for analyzing digital devices through increasing the dimension of the
data structures and memory are proposed. The basic concepts, terminology and
definitions necessary for the implementation of quantum computing when
analyzing virtual computers are introduced. The investigation results
concerning design and modeling computer systems in a cyberspace based on the
use of two-component structure are presented.Comment: 9 pages,4 figures, Proceeding of the Third International Conference
on Data Mining & Knowledge Management Process (CDKP 2014
Range-enhanced packet classification to improve computational performance on field programmable gate array
Multi-filed packet classification is a powerful classification engine that classifies input packets into different fields based on predefined rules. As the demand for the internet increases, efficient network routers can support many network features like quality of services (QoS), firewalls, security, multimedia communications, and virtual private networks. However, the traditional packet classification methods do not fulfill todayâs network functionality and requirements efficiently. In this article, an efficient range enhanced packet classification (REPC) module is designed using a range bit-vector encoding method, which provides a unique design to store the precomputed values in memory. In addition, the REPC supports range to prefix features to match the packets to the corresponding header fields. The synthesis and implementation results of REPC are analyzed and tabulated in detail. The REPC module utilizes 3% slices on Artix-7 field programmable gate array (FPGA), works at 99.87 Gbps throughput with a latency of 3 clock cycles. The proposed REPC is compared with existing packet classification approaches with better hardware constraints improvements
- âŠ