11 research outputs found
Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates
The growth of the Internet has enabled it to become a critical component used by businesses, governments and individuals. While most of the traffic on the Internet is legitimate, a proportion of the traffic includes worms, computer viruses, network intrusions, computer espionage, security breaches and illegal behavior. This rogue traffic causes computer and network outages, reduces network throughput, and costs governments and companies billions of dollars each year. This dissertation investigates the problems associated with TCP stream processing in high-speed networks. It describes an architecture that simplifies the processing of TCP data streams in these environments and presents a hardware circuit capable of TCP stream processing on multi-gigabit networks for millions of simultaneous network connections. Live Internet traffic is analyzed using this new TCP processing circuit
Recommended from our members
Exploitation from Malicious PCI Express Peripherals
The thesis of this dissertation is that, despite widespread belief in the security community, systems are still vulnerable to attacks from malicious peripherals delivered over the PCI Express (PCIe) protocol.
Malicious peripherals can be plugged directly into internal PCIe slots, or connected via an external Thunderbolt connection.
To prove this thesis, we designed and built a new PCIe attack platform.
We discovered that a simple platform was insufficient to carry out complex attacks, so created the first PCIe attack platform that runs a full, conventional OS.
To allows us to conduct attacks against higher-level OS functionality built on PCIe, we made the attack platform emulate in detail the behaviour of an Intel 82574L Network Interface Controller (NIC), by using a device model extracted from the QEMU emulator.
We discovered a number of vulnerabilities in the PCIe protocol itself, and with the way that the defence mechanisms it provides are used by modern OSs.
The principal defence mechanism provided is the Input/Output Memory Management Unit (IOMMU).
The remaps the address space used by peripherals in 4KiB chunks, and can prevent access to areas of address space that a peripheral should not be able to access.
We found that, contrary to belief in the security community, the IOMMUs in modern systems were not designed to protect against attacks from malicious peripherals, but to allow virtual machines direct access to real hardware.
We discovered that use of the IOMMU is patchy even in modern operating systems.
Windows effectively does not use the IOMMU at all; macOS opens windows that are shared by all devices; Linux and FreeBSD map windows into host memory separately for each device, but only if poorly documented boot flags are used.
These OSs make no effort to ensure that only data that should be visible to the devices is in the mapped windows.
We created novel attacks that subverted control flow and read private data against systems running macOS, Linux and FreeBSD with the highest level of relevant protection enabled.
These represent the first use of the relevant exploits in each case.
In the final part of this thesis, we evaluate the suitability of a number of proposed general purpose and specific mitigations against DMA attacks, and make a number of recommendations about future directions in IOMMU software and hardware.EPSRC and ARM iCASE Awar
A scalable packetised radio astronomy imager
Includes bibliographical referencesModern radio astronomy telescopes the world over require digital back-ends. The complexity of these systems depends on many site-specific factors, including the number of antennas, beams and frequency channels and the bandwidth to be processed. With the increasing popularity for ever larger interferometric arrays, the processing requirements for these back-ends have increased significantly. While the techniques for building these back-ends are well understood, every installation typically still takes many years to develop as the instruments use highly specialised, custom hardware in order to cope with the demanding engineering requirements. Modern technology has enabled reprogrammable FPGA-based processing boards, together with packet-based switching techniques, to perform all the digital signal processing requirements of a modern radio telescope array. The various instruments used by radio telescopes are functionally very different, but the component operations remain remarkably similar and many share core functionalities. Generic processing platforms are thus able to share signal processing libraries and can acquire different personalities to perform different functions simply by reprogramming them and rerouting the data appropriately. Furthermore, Ethernet-based packet-switched networks are highly flexible and scalable, enabling the same instrument design to be scaled to larger installations simply by adding additional processing nodes and larger network switches. The ability of a packetised network to transfer data to arbitrary processing nodes, along with these nodes' reconfigurability, allows for unrestrained partitioning of designs and resource allocation. This thesis describes the design and construction of the first working radio astronomy imaging instrument hosted on Ethernet-interconnected re- programmable FPGA hardware. I attempt to establish an optimal packetised architecture for the most popular instruments with particular attention to the core array functions of correlation and beamforming. Emphasis is placed on requirements for South Africa's MeerKAT array. A demonstration system is constructed and deployed on the KAT-7 array, MeerKAT's prototype. This research promises reduced instrument development time, lower costs, improved reliability and closer collaboration between telescope design teams
Internet of Things From Hype to Reality
The Internet of Things (IoT) has gained significant mindshare, let alone attention, in academia and the industry especially over the past few years. The reasons behind this interest are the potential capabilities that IoT promises to offer. On the personal level, it paints a picture of a future world where all the things in our ambient environment are connected to the Internet and seamlessly communicate with each other to operate intelligently. The ultimate goal is to enable objects around us to efficiently sense our surroundings, inexpensively communicate, and ultimately create a better environment for us: one where everyday objects act based on what we need and like without explicit instructions
Recommended from our members
Optimising data centre operation by removing the transport bottleneck
Data centres lie at the heart of almost every service on the Internet. Data centres are used to provide search results, to power social media, to store and index email, to host “cloud” applications, for online retail and to provide a myriad of other web services. Consequently the more efficient they can be made the better for all of us. The power of modern data centres is in combining commodity off-the-shelf server hardware and network equipment to provide what Google’s Barrosso and Ho ̈lzle describe as “warehouse scale” computers.
Data centres rely on TCP, a transport protocol that was originally designed for use in the Internet. Like other such protocols, TCP has been optimised to maximise throughput, usually by filling up queues at the bottleneck. However, for most applications within a data centre network latency is more critical than throughput. Consequently the choice of transport protocol becomes a bottleneck for performance. My thesis is that the solution to this is to move away from the use of one-size-fits-all transport protocols towards ones that have been designed to reduce latency across the data centre and which can dynamically respond to the needs of the applications.
This dissertation focuses on optimising the transport layer in data centre networks. In particular I address the question of whether any single transport mechanism can be flexible enough to cater to the needs of all data centre traffic. I show that one leading protocol (DCTCP) has been heavily optimised for certain network conditions. I then explore approaches that seek to minimise latency for applications that care about it while still allowing throughput-intensive applications to receive a good level of service. My key contributions to this are Silo and Trevi.
Trevi is a novel transport system for storage traffic that utilises fountain coding to max- imise throughput and minimise latency while being agnostic to drop, thus allowing storage traffic to be pushed out of the way when latency sensitive traffic is present in the network. Silo is an admission control system that is designed to give tenants of a multi-tenant data centre guaranteed low latency network performance. Both of these were developed in collaboration with others
Novel Architectures for Offloading and Accelerating Computations in Artificial Intelligence and Big Data
Due to the end of Moore's Law and Dennard Scaling, performance gains in general-purpose architectures have significantly slowed in recent years. While raising the number of cores has been a viable approach for further performance increases, Amdahl's Law and its implications on parallelization also limit further performance gains. Consequently, research has shifted towards different approaches, including domain-specific custom architectures tailored to specific workloads.
This has led to a new golden age for computer architecture, as noted in the Turing Award Lecture by Hennessy and Patterson, which has spawned several new architectures and architectural advances specifically targeted at highly current workloads, including Machine Learning. This thesis introduces a hierarchy of architectural improvements ranging from minor incremental changes, such as High-Bandwidth Memory, to more complex architectural extensions that offload workloads from the general-purpose CPU towards more specialized accelerators. Finally, we introduce novel architectural paradigms, namely Near-Data or In-Network Processing, as the most complex architectural improvements.
This cumulative dissertation then investigates several architectural improvements to accelerate Sum-Product Networks, a novel Machine Learning approach from the class of Probabilistic Graphical Models. Furthermore, we use these improvements as case studies to discuss the impact of novel architectures, showing that minor and major architectural changes can significantly increase performance in Machine Learning applications.
In addition, this thesis presents recent works on Near-Data Processing, which introduces Smart Storage Devices as a novel architectural paradigm that is especially interesting in the context of Big Data. We discuss how Near-Data Processing can be applied to improve performance in different database settings by offloading database operations to smart storage devices. Offloading data-reductive operations, such as selections, reduces the amount of data transferred, thus improving performance and alleviating bandwidth-related bottlenecks.
Using Near-Data Processing as a use-case, we also discuss how Machine Learning approaches, like Sum-Product Networks, can improve novel architectures. Specifically, we introduce an approach for offloading Cardinality Estimation using Sum-Product Networks that could enable more intelligent decision-making in smart storage devices. Overall, we show that Machine Learning can benefit from developing novel architectures while also showing that Machine Learning can be applied to improve the applications of novel architectures
Consensus protocols exploiting network programmability
Services rely on replication mechanisms to be available at all time. The service demanding high availability is replicated on a set of machines called replicas. To maintain the consistency of replicas, a consensus protocol such as Paxos or Raft is used to synchronize the replicas' state. As a result, failures of a minority of replicas will not affect the service as other non-faulty replicas continue serving requests. A consensus protocol is a procedure to achieve an agreement among processors in a distributed system involving unreliable processors. Unfortunately, achieving such an agreement involves extra processing on every request, imposing a substantial performance degradation. Consequently, performance has long been a concern for consensus protocols. Although many efforts have been made to improve consensus performance, it continues to be an important problem for researchers. This dissertation presents a novel approach to improving consensus performance. Essentially, it exploits the programmability of a new breed of network devices to accelerate consensus protocols that traditionally run on commodity servers. The benefits of using programmable network devices to run consensus protocols are twofold: The network switches process packets faster than commodity servers and consensus messages travel fewer hops in the network. It means that the system throughput is increased and the latency of requests is reduced. The evaluation of our network-accelerated consensus approach shows promising results. Individual components of our FPGA- based and switch-based consensus implementations can process 10 million and 2.5 billion consensus messages per second, respectively. Our FPGA-based system as a whole delivers 4.3 times performance of a traditional software consensus implementation. The latency is also better for our system and is only one third of the latency of the software consensus implementation when both systems are under half of their maximum throughputs. In order to drive even higher performance, we apply a partition mechanism to our switch-based system, leading to 11 times better throughput and 5 times better latency. By dynamically switching between software-based and network-based implementations, our consensus systems not only improve performance but also use energy more efficiently. Encouraged by those benefits, we developed a fault-tolerant non-volatile memory system. A prototype using software memory controller demonstrated reasonable overhead over local memory access, showing great promise as scalable main memory. Our network-based consensus approach would have a great impact in data centers. It not only improves performance of replication mechanisms which relied on consensus, but also enhances performance of services built on top of those replication mechanisms. Our approach also motivates others to move new functionalities into the network, such as, key-value store and stream processing. We expect that in the near future, applications that typically run on traditional servers will be folded into networks for performance
Evaluating Techniques for Wireless Interconnected 3D Processor Arrays
In this thesis the viability of a wireless interconnect network for a highly parallel computer is investigated. The main theme of this thesis is to project the performance of a wireless network used to connect the processors in a parallel machine of such design. This thesis is going to investigate new design opportunities a wireless interconnect network can offer for parallel computing.
A simulation environment is designed and implemented to carry out the tests. The results have shown that if the available radio spectrum is shared effectively between building blocks of the parallel machine, there are substantial chances to achieve high processor utilisation. The results show that some factors play a major role in the performance of such a machine. The size of the machine, the size of the problem and the communication and computation capabilities of each element of the machine are among those factors. The results show these factors set a limit on the number of nodes engaged in some classes of tasks. They have shown promising potential for further expansion and evolution of our idea to new architectural opportunities, which is discussed by the end of this thesis.
To build a real machine of this type the architects would need to solve a number of challenging problems including heat dissipation, delivering electric power and Chip/board design; however, these issues are not part of this thesis and will be tackled in future