93 research outputs found
Recommended from our members
Securing Network Processors with Hardware Monitors
As an essential part of modern society, the Internet has fundamentally changed our lives during the last decade. Novel applications and technologies, such as online shopping, social networking, cloud computing, mobile networking, etc, have sprung up at an astonishing pace. These technologies not only influence modern life styles but also impact Internet infrastructure. Numerous new network applications and services require better programmability and flexibility for network devices, such as routers and switches. Since traditional fixed function network routers based on application specific integrated circuits (ASICs) have difficulty keeping pace with the growing demands of next-generation Internet applications, there is an ongoing shift in the industry toward implementing network devices using programmable network processors (NPs).
While network processors offer great benefits in terms of flexibility, their reprogrammable nature exposes potential security risks. Similar to network end-systems, such as general-purpose computers, software-based network processors have security vulnerabilities that can be attacked remotely. Recent research has shown that a new type of data plane attack is able to modify the functionality of a network processor and cause a denial-of-service (DoS) attack by sending a single malformed UDP packet. Since this attack relies solely on data plane access and does not need access to the control plane, it can be particularly difficult to control.
Hardware security monitors have been introduced to identify and eliminate these malicious packets before they can propagate and cause devastating effects in the network. However, previous work on hardware monitors only focus on single core systems with static (or very slowly changing) workloads. In network processors that use up to hundreds of parallel processor cores and have processing workloads that can change dynamically based on the network traffic, the realization of a complete multicore hardware monitoring system remains a critical challenge. Our research work in this thesis provides a comprehensive solution to this problem.
Our first contribution is the design and prototype implementation of a Scalable Hardware Monitoring Grid (SHMG). This scalable architecture balances area cost and performance overhead by using a clustered approach for multicore NP systems. In order to adapt to dynamically changing network traffic, a resource reallocation algorithm is designed to reassign the processing resources in SHMG to different network applications at runtime. An evaluation of the prototype SHMG on an Altera DE4 board demonstrates low resource and performance overheads. The functionality and performance of a runtime resource reallocation algorithm are tested using a simulation environment.
A second significant contribution of this work is a network system-level security solution for multicore network processors with hardware monitors. It addresses two key problems: (1) how to securely manage and reprogram processor cores and monitors in a deployed router in the network, and (2) how to prevent the large number of identical router devices in the network from an attack that can circumvent one specific monitoring system and lead to Internet-scale failures. A Secure Dynamic Multicore Hardware Monitoring System (SDMMon) is designed based on cryptographic principles and suitable key management to ensure the secure installation of processor binaries and monitor graphs. We present a Merkle tree based parameterizable high performance hash function that can be configured to perform a variety of functions in different devices via a 32-bit configuration parameter. A prototype system composed of both the SDMMon and the parameterizable hash is implemented and evaluated on an Altera DE4 board.
Finally, a fully-functional, comprehensive Multicore NP Security Platform, which integrates both the SHMG and the SDMMon security features, has been implemented on an Altera DE5 board
Performance and Limitation Review of Secure Hash Function Algorithm
A cryptographic hash work is a phenomenal class of hash work that has certain properties which make it fitting for use in cryptography. It is a numerical figuring that maps information of emotional size to a bit string of a settled size (a hash) and is expected to be a confined limit, that is, a limit which is infeasible to adjust. Hash Functions are significant instrument in information security over the web. The hash functions that are utilized in different security related applications are called cryptographic hash functions. This property is additionally valuable in numerous different applications, for example, production of digital signature and arbitrary number age and so on. The vast majority of the hash functions depend on Merkle-Damgard development, for example, MD-2, MD-4, MD-5, SHA-1, SHA-2, SHA-3 and so on, which are not hundred percent safe from assaults. The paper talks about a portion of the secure hash function, that are conceivable on this development, and accordingly on these hash functions additionally face same attacks
HAL-ASOS accelerator model: evolutive elasticity by design
To address the integration of software threads and hardware accelerators into the Linux Operating System (OS) programming models, an accelerator architecture is proposed, based on micro-programmable hardware system calls, which fully export these resources into the Linux OS user-space through a design-specific virtual file system. The proposed HAL-ASOS accelerator model is split into a user-defined Hardware Task and a parameterizable Hardware Kernel with three differentiated transfer channels, aiming to explore distinct BUS technology interfaces and promote the accelerator to a first-class computing unit. This paper focuses on the Hardware Kernel and mainly its microcode control unit, which will leverage the elasticity to naturally evolve with Linux OS through key differentiating capabilities of field programmable gate arrays (FPGAs) when compared to the state of the art. To comply with the evolutive nature of Linux OS, or any Hardware Task incremental features, the proposed model generates page-faults signaling runtime errors that are handled at the kernel level as part of the virtual file system runtime. To evaluate the accelerator model’s programmability and its performance, a client-side application based on the AES 128-bit algorithm was implemented. Experiments demonstrate a flexible design approach in terms of hardware and software reconfiguration and significant performance increases consistent with rising processing demands or clock design frequencies.This work has been supported by FCT-Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020
Identification of dynamic circuit specialization opportunities in RTL code
Dynamic Circuit Specialization (DCS) optimizes a Field-Programmable Gate Array (FPGA) design by assuming a set of its input signals are constant for a reasonable amount of time, leading to a smaller and faster FPGA circuit. When the signals actually change, a new circuit is loaded into the FPGA through runtime reconfiguration. The signals the design is specialized for are called parameters. For certain designs, parameters can be selected so the DCS implementation is both smaller and faster than the original implementation. However, DCS also introduces an overhead that is difficult for the designer to take into account, making it hard to determine whether a design is improved by DCS or not. This article presents extensive results on a profiling methodology that analyses Register-Transfer Level (RTL) implementations of applications to check if DCS would be beneficial. It proposes to use the functional density as a measure for the area efficiency of an implementation, as this measure contains both the overhead and the gains of a DCS implementation. The first step of the methodology is to analyse the dynamic behaviour of signals in the design, to find good parameter candidates. The overhead of DCS is highly dependent on this dynamic behaviour. A second stage calculates the functional density for each candidate and compares it to the functional density of the original design. The profiling methodology resulted in three implementations of a profiling tool, the DCS-RTL profiler. The execution time, accuracy, and the quality of each implementation is assessed based on data from 10 RTL designs. All designs, except for the two 16-bit adaptable Finite Impulse Response (FIR) filters, are analysed in 1 hour or less
HAL-ASOS - Linux com aceleração em hardware para sistemas operativos dedicados à aplicação
Programa doutoral em Engenharia Eletrónica e de Computadores (PDEEC) (especialidade de Informática Industrial e Sistemas Embebidos)O ecossistema de sistemas embebidos de hoje tornou-se enorme, cobrindo vários e diferentes sistemas,
exigindo desempenho e mobilidade completa enquanto atingem autonomias de bateria cada vez maiores.
Mas a crescente frequência de relógio que resultou em dispositivos cada vez mais rápidos começou a
estagnar antes dos transístores pararem de encolher. Plataformas Field Programmable Gate Array (FPGA)
são uma solução alternativa para a implementação de sistemas completos e reconfiguráveis. Fornecem
desempenho e eficiência computacional para satisfazer requisitos da aplicação e do sistema embebido.
Vários Sistemas Operativos (SO) assistidos por FPGA foram propostos, mas ao estreitar seu foco na síntese
do datapath do acelerador de hardware, a grande maioria ignora a integração semântica destes no
SO. Ambientes de síntese de alto nível (HLS) elevaram a abstração além da linguagem de transferência de
registo (RTL), seguindo uma abordagem específica de domínio enquanto misturam software e abstrações
de hardware ad hoc, que dificultam as otimizações. Além disso, os modelos de programação para software
e hardware reconfigurável carecem de semelhanças, o que com o tempo dificultará a Exploração
do Ambiente de Design (DSE) e diminuirá o potencial de reutilização de código. Para responder a estas
necessidades, propomos HAL-ASOS, uma ferramenta para implementar sistemas embebidos baseados
em Linux que fornece (1) elasticidade no design em conformidade com a natureza evolutiva deste SO, (2)
integração semântica profunda de tarefas de hardware nos modelos de programação do Linux, (3) facilidade
na gestão de complexidade através de metodologia e ferramentas para apoiar o design, verificação
e implementação, (4) orientada por princípios de design híbridos e eficiência no sistema. Para avaliar as
funcionalidades da ferramenta, foi implementado um aplicativo criptográfico que demonstra alcance de
desempenho enquanto se emprega a metodologia de design. Novos níveis de desempenho são atingidos
numa aplicação de Visão por Computador que explora recursos de programação assíncrona-síncrona. Os
resultados demonstram uma abordagem flexível na reconfiguração entre hardware e software, e desempenho
que aumenta consistentemente com acréscimo de recursos ou frequência de relógio.Today’s embedded systems ecosystem became huge while covering several and different computer-based
systems, demanding for performance and complete mobility while experiencing longer battery lives. But
the rampant frequency that resulted in faster devices began hitting a wall even before transistors stopped
shrinking. Field Programmable Gate Array (FPGA) platforms are an alternative solution towards implementing
complete reconfigurable systems. They provide computational power, efficiency, in a lightweight
solution to serve the application requirements and increase performance in the overall system. Several
FPGA-assisted Operating Systems (OS) have been proposed, but by narrowing their focus on datapath
synthesis of the hardware accelerator, they completely ignore the deep semantic integration of these accelerators
into the OS. State-of-the-art High-Level Synthesis (HLS) environments have raised the level of
abstraction beyond Register Transfer Language (RTL) by following a domain-specific approach while mixing
ad hoc software and hardware abstractions, making harder for performance optimizations. Furthermore,
the programming models for software and reconfigurable hardware lack commonalities, which in time will
hinder the Design Space Exploration (DSE) and lower the potential for code reuse. To overcome these
issues, we propose HAL-ASOS, a framework to implement Linux-based Embedded systems which provides
(1) elasticity by design to comply with the evolutive nature of Linux, (2) deep semantic integration of the
hardware tasks in the Linux programming models, (3) easy complexity management using methodology
and tools to fully support design, verification and deployment, (4) hybrid and efficiency-oriented design
principles. To evaluate the framework functionalities, a cryptographic application was implemented and
demonstrates performance achievements while using the promoted application-driven design methodology.
To demonstrate new levels of performance that can be achieved, a Computer Vision application
explores several mixed asynchronous-synchronous programming features. Experiments demonstrate a
flexible design approach in terms of hardware and software reconfiguration, and significant performance
that increases consistently with the rising in processing resources or clock frequencies.Financial support received from Portuguese Foundation for Science and Technology (FCT) with the PhD grant SFRH/BD/82732/2011
The Development of TIGRA: A Zero Latency Interface For Accelerator Communication in RISC-V Processors
Field programmable gate arrays (FPGA) give developers the ability to design application specific hardware by means of software, providing a method of accelerating algorithms with higher power efficiency when compared to CPU or GPU accelerated applications. FPGA accelerated applications tend to follow either a loosely coupled or tightly coupled design. Loosely coupled designs often use OpenCL to utilize the FPGA as an accelerator much like a GPU, which provides a simplifed design flow with the trade-off of increased overhead and latency due to bus communication. Tightly coupled designs modify an existing CPU to introduce instruction set extensions to provide a minimal latency accelerator at the cost of higher programming effort to include the custom design.
This dissertation details the design of the Tightly Integrated, Generic RISC-V Accelerator (TIGRA) interface which provides the benefits of both loosely and tightly coupled accelerator designs. TIGRA enabled designs incur zero latency with a simple-to-use interface that reduces programming effort when implementing custom logic within a processor. This dissertation shows the incorporation of TIGRA into the simple PicoRV32 processor, the highly customizable Rocket Chip generator, and the FPGA optimized Taiga processor. Each processor design is tested with AES 128-bit encryption and posit arithmetic to demonstrate TIGRA functionality.
After a one time programming cost to incorporate a TIGRA interface into an existing processor, new functional units can be added with up to a 75% reduction in the lines of code required when compared to non-TIGRA enabled designs. Additionally, each functional unit created is co-compatible with each processor as the TIGRA interface remains constant between each design. The results prove that using the TIGRA interface introduces no latency and is capable of incorporating existing custom logic designs without modification for all three processors tested. When compared to the PicoRV32 coprocessor interface (PCPI), TIGRA coupled designs complete one clock cycle faster. Similarly, TIGRA outperforms the Rocket Chip custom coprocessor (RoCC) interface by an average of 6.875 clock cycles per instruction. The Taiga processor\u27s decoupled execution units allow for instructions to execute concurrently and uses a tag management system that is similar to out-of-order processors. The inclusion of the TIGRA interface within this processor abstracts the tag management from the user and demonstrates that the TIGRA interface can be applied to out-of-order processors.
When coupled with partial reconfiguration, the flexibility and modularity of TIGRA drastically increases. By creating a reprogrammable region for the custom logic connected via TIGRA, users can swap out the connected design at runtime to customize the processor for a given application. Further, partial reconfiguration allows users to only compile the custom logic design as opposed to the entire CPU, resulting in an 18.1% average reduction of compilation during the design process in the case studies. Paired with the programming effort saved by using TIGRA, partial reconfiguration improves the time to design and test new functionality timelines for a processor
A Framework and Protocol for Dynamic Management of Fault Tolerant Systems in Harsh Environments
Robots can be used to deal with hazardous materials like nuclear waste. Unfortunately, electronic components are also susceptible to radiation effects. Current proposals to tackle this issue solve only parts of the problems for the specific scenarios and the specific types of radiation. At the same time, current computational devices should provide run-time capabilities to monitor and adapt to different situations. In this paper, we target a possible solution presenting a framework which provides the flexibility to employ fault-tolerant techniques on distributed systems. As proof of concept, we target a fault-tolerant technique to extend the operating time of systems in harsh environments. Results show a very low overhead, of few microseconds, to execute a majority voter with replicated tasks
Design of secure and trustworthy system-on-chip architectures using hardware-based root-of-trust techniques
Cyber-security is now a critical concern in a wide range of embedded computing modules, communications systems, and connected devices. These devices are used in medical electronics, automotive systems, power grid systems, robotics, and avionics. The general consensus today is that conventional approaches and software-only schemes are not sufficient to provide desired security protections and trustworthiness.
Comprehensive hardware-software security solutions so far have remained elusive. One major challenge is that in current system-on-chip (SoCs) designs, processing elements (PEs) and executable codes with varying levels of trust, are all integrated on the same computing platform to share resources. This interdependency of modules creates a fertile attack ground and represents the Achilles’ heel of heterogeneous SoC architectures.
The salient research question addressed in this dissertation is “can one design a secure computer system out of non-secure or untrusted computing IP components and cores?”. In response to this question, we establish a generalized, user/designer-centric set of design principles which intend to advance the construction of secure heterogeneous multi-core computing systems. We develop algorithms, models of computation, and hardware security primitives to integrate secure and non-secure processing elements into the same chip design while aiming for: (a) maintaining individual core’s security; (b) preventing data leakage and corruption; (c) promoting data and resource sharing among the cores; and (d) tolerating malicious behaviors from untrusted processing elements and software applications.
The key contributions of this thesis are:
1. The introduction of a new architectural model for integrating processing elements with different security and trust levels, i.e., secure and non-secure cores with trusted and untrusted provenances;
2. A generalized process isolation design methodology for the new architecture model that covers both the software and hardware layers to (i) create hardware-assisted virtual logical zones, and (ii) perform both static and runtime security, privilege level and trust authentication checks;
3. A set of secure protocols and hardware root-of-trust (RoT) primitives to support the process isolation design and to provide the following functionalities: (i) hardware immutable identities – using physical unclonable functions, (ii) core hijacking and impersonation resistance – through a blind signature scheme, (iii) threshold-based data access control – with a robust and adaptive secure secret sharing algorithm, (iv) privacy-preserving authorization verification – by proposing a group anonymous authentication algorithm, and (v) denial of resource or denial of service attack avoidance – by developing an interconnect network routing algorithm and a memory access mechanism according to user-defined security policies.
4. An evaluation of the security of the proposed hardware primitives in the post-quantum era, and possible extensions and algorithmic modifications for their post-quantum resistance.
In this dissertation, we advance the practicality of secure-by-construction methodologies in SoC architecture design. The methodology allows for the use of unsecured or untrusted processing elements in the construction of these secure architectures and tries to extend their effectiveness into the post-quantum computing era
Dataflow Computing with Polymorphic Registers
Heterogeneous systems are becoming increasingly popular for data processing. They improve performance of simple kernels applied to large amounts of data. However, sequential data loads may have negative impact. Data parallel solutions such as Polymorphic Register Files (PRFs) can potentially accelerate applications by facilitating high speed, parallel access to performance-critical data. Furthermore, by PRF customization, specific data path features are exposed to the programmer in a very convenient way. PRFs allow additional control over the registers dimensions, and the number of elements which can be simultaneously accessed by computational units. This paper shows how PRFs can be integrated in dataflow computational platforms. In particular, starting from an annotated source code, we present a compiler-based methodology that automatically generates the customized PRFs and the enhanced computational kernels that efficiently exploit them
- …