192 research outputs found
Recommended from our members
Exploitation from Malicious PCI Express Peripherals
The thesis of this dissertation is that, despite widespread belief in the security community, systems are still vulnerable to attacks from malicious peripherals delivered over the PCI Express (PCIe) protocol.
Malicious peripherals can be plugged directly into internal PCIe slots, or connected via an external Thunderbolt connection.
To prove this thesis, we designed and built a new PCIe attack platform.
We discovered that a simple platform was insufficient to carry out complex attacks, so created the first PCIe attack platform that runs a full, conventional OS.
To allows us to conduct attacks against higher-level OS functionality built on PCIe, we made the attack platform emulate in detail the behaviour of an Intel 82574L Network Interface Controller (NIC), by using a device model extracted from the QEMU emulator.
We discovered a number of vulnerabilities in the PCIe protocol itself, and with the way that the defence mechanisms it provides are used by modern OSs.
The principal defence mechanism provided is the Input/Output Memory Management Unit (IOMMU).
The remaps the address space used by peripherals in 4KiB chunks, and can prevent access to areas of address space that a peripheral should not be able to access.
We found that, contrary to belief in the security community, the IOMMUs in modern systems were not designed to protect against attacks from malicious peripherals, but to allow virtual machines direct access to real hardware.
We discovered that use of the IOMMU is patchy even in modern operating systems.
Windows effectively does not use the IOMMU at all; macOS opens windows that are shared by all devices; Linux and FreeBSD map windows into host memory separately for each device, but only if poorly documented boot flags are used.
These OSs make no effort to ensure that only data that should be visible to the devices is in the mapped windows.
We created novel attacks that subverted control flow and read private data against systems running macOS, Linux and FreeBSD with the highest level of relevant protection enabled.
These represent the first use of the relevant exploits in each case.
In the final part of this thesis, we evaluate the suitability of a number of proposed general purpose and specific mitigations against DMA attacks, and make a number of recommendations about future directions in IOMMU software and hardware.EPSRC and ARM iCASE Awar
Improving low latency applications for reconfigurable devices
This thesis seeks to improve low latency application performance via architectural improvements in reconfigurable devices. This is achieved by improving resource utilisation and access, and by exploiting the different environments within which reconfigurable devices are deployed.
Our first contribution leverages devices deployed at the network level to enable the low latency processing of financial market data feeds. Financial exchanges transmit messages via two identical data feeds to reduce the chance of message loss. We present an approach to arbitrate these redundant feeds at the network level using a Field-Programmable Gate Array (FPGA). With support for any messaging protocol, we evaluate our design using the NASDAQ TotalView-ITCH, OPRA, and ARCA data feed protocols, and provide two simultaneous outputs: one prioritising low latency, and one prioritising high reliability with three dynamically configurable windowing methods.
Our second contribution is a new ring-based architecture for low latency, parallel access to FPGA memory. Traditional FPGA memory is formed by grouping block memories (BRAMs) together and accessing them as a single device. Our architecture accesses these BRAMs independently and in parallel. Targeting memory-based computing, which stores pre-computed function results in memory, we benefit low latency applications that rely on: highly-complex functions; iterative computation; or many parallel accesses to a shared resource. We assess square root, power, trigonometric, and hyperbolic functions within the FPGA, and provide a tool to convert Python functions to our new architecture.
Our third contribution extends the ring-based architecture to support any FPGA processing element. We unify E heterogeneous processing elements within compute pools, with each element implementing the same function, and the pool serving D parallel function calls. Our implementation-agnostic approach supports processing elements with different latencies, implementations, and pipeline lengths, as well as non-deterministic latencies. Compute pools evenly balance access to processing elements across the entire application, and are evaluated by implementing eight different neural network activation functions within an FPGA.Open Acces
Modeling Data-Plane Power Consumption of Future Internet Architectures
With current efforts to design Future Internet Architectures (FIAs), the
evaluation and comparison of different proposals is an interesting research
challenge. Previously, metrics such as bandwidth or latency have commonly been
used to compare FIAs to IP networks. We suggest the use of power consumption as
a metric to compare FIAs. While low power consumption is an important goal in
its own right (as lower energy use translates to smaller environmental impact
as well as lower operating costs), power consumption can also serve as a proxy
for other metrics such as bandwidth and processor load.
Lacking power consumption statistics about either commodity FIA routers or
widely deployed FIA testbeds, we propose models for power consumption of FIA
routers. Based on our models, we simulate scenarios for measuring power
consumption of content delivery in different FIAs. Specifically, we address two
questions: 1) which of the proposed FIA candidates achieves the lowest energy
footprint; and 2) which set of design choices yields a power-efficient network
architecture? Although the lack of real-world data makes numerous assumptions
necessary for our analysis, we explore the uncertainty of our calculations
through sensitivity analysis of input parameters
Recommended from our members
Securing Network Processors with Hardware Monitors
As an essential part of modern society, the Internet has fundamentally changed our lives during the last decade. Novel applications and technologies, such as online shopping, social networking, cloud computing, mobile networking, etc, have sprung up at an astonishing pace. These technologies not only influence modern life styles but also impact Internet infrastructure. Numerous new network applications and services require better programmability and flexibility for network devices, such as routers and switches. Since traditional fixed function network routers based on application specific integrated circuits (ASICs) have difficulty keeping pace with the growing demands of next-generation Internet applications, there is an ongoing shift in the industry toward implementing network devices using programmable network processors (NPs).
While network processors offer great benefits in terms of flexibility, their reprogrammable nature exposes potential security risks. Similar to network end-systems, such as general-purpose computers, software-based network processors have security vulnerabilities that can be attacked remotely. Recent research has shown that a new type of data plane attack is able to modify the functionality of a network processor and cause a denial-of-service (DoS) attack by sending a single malformed UDP packet. Since this attack relies solely on data plane access and does not need access to the control plane, it can be particularly difficult to control.
Hardware security monitors have been introduced to identify and eliminate these malicious packets before they can propagate and cause devastating effects in the network. However, previous work on hardware monitors only focus on single core systems with static (or very slowly changing) workloads. In network processors that use up to hundreds of parallel processor cores and have processing workloads that can change dynamically based on the network traffic, the realization of a complete multicore hardware monitoring system remains a critical challenge. Our research work in this thesis provides a comprehensive solution to this problem.
Our first contribution is the design and prototype implementation of a Scalable Hardware Monitoring Grid (SHMG). This scalable architecture balances area cost and performance overhead by using a clustered approach for multicore NP systems. In order to adapt to dynamically changing network traffic, a resource reallocation algorithm is designed to reassign the processing resources in SHMG to different network applications at runtime. An evaluation of the prototype SHMG on an Altera DE4 board demonstrates low resource and performance overheads. The functionality and performance of a runtime resource reallocation algorithm are tested using a simulation environment.
A second significant contribution of this work is a network system-level security solution for multicore network processors with hardware monitors. It addresses two key problems: (1) how to securely manage and reprogram processor cores and monitors in a deployed router in the network, and (2) how to prevent the large number of identical router devices in the network from an attack that can circumvent one specific monitoring system and lead to Internet-scale failures. A Secure Dynamic Multicore Hardware Monitoring System (SDMMon) is designed based on cryptographic principles and suitable key management to ensure the secure installation of processor binaries and monitor graphs. We present a Merkle tree based parameterizable high performance hash function that can be configured to perform a variety of functions in different devices via a 32-bit configuration parameter. A prototype system composed of both the SDMMon and the parameterizable hash is implemented and evaluated on an Altera DE4 board.
Finally, a fully-functional, comprehensive Multicore NP Security Platform, which integrates both the SHMG and the SDMMon security features, has been implemented on an Altera DE5 board
NetFPGA: status, uses, developments, challenges, and evaluation
The constant growth of the Internet, driven by the demand for timely access to data center networks; has meant
that the technological platforms necessary to achieve this purpose are outside the current budgets. In this order to make and
validate relevant, timely and relevant contributions; it is necessary that a wider community, access to evaluation,
experimentation and demonstration environments with specifications that can be compared with existing networking
solutions. This article introduces the NetFPGA, which is a platform to develop network hardware for reconfigurable and
rapid prototyping. It’s introduces the application areas in high-performance networks, advantages for traffic analysis,
packet flow, hardware acceleration, power consumption and parallel processing in real time. Likewise, it presents the
advantages of the platform for research, education, innovation, and future trends of this platform. Finally, we present a
performance evaluation of the tool called OSNT (Open-Source Network Tester) and shows that OSNT has 95% accuracy
of timestamp with resolution of 10ns for the generation of TCP traffic, and 90% efficiency capturing packets at 10Gbps of
full line-rate
Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates
The growth of the Internet has enabled it to become a critical component used by businesses, governments and individuals. While most of the traffic on the Internet is legitimate, a proportion of the traffic includes worms, computer viruses, network intrusions, computer espionage, security breaches and illegal behavior. This rogue traffic causes computer and network outages, reduces network throughput, and costs governments and companies billions of dollars each year. This dissertation investigates the problems associated with TCP stream processing in high-speed networks. It describes an architecture that simplifies the processing of TCP data streams in these environments and presents a hardware circuit capable of TCP stream processing on multi-gigabit networks for millions of simultaneous network connections. Live Internet traffic is analyzed using this new TCP processing circuit
A High-Throughput Hardware Implementation of NAT Traversal For IPSEC VPN
In this paper, we present a high-throughput FPGA implementation of IPSec core. The core supports both NAT and non-NAT mode and can be used in high speed security gateway devices. Although IPSec ESP is very computing intensive for its cryptography process, our implementation shows that it can achieve high throughput and low lantency. The system is realized on the Zynq XC7Z045 from Xilinx and was verified and tested in practice. Results show that the design can gives a peak throughput of 5.721 Gbps for the IPSec ESP tunnel mode in NAT mode and 7.753 Gbps in non-NAT mode using one single AES encrypt core. We also compare the performance of the core when running in other mode of encryption
- …