57 research outputs found
Recommended from our members
Securing Network Processors with Hardware Monitors
As an essential part of modern society, the Internet has fundamentally changed our lives during the last decade. Novel applications and technologies, such as online shopping, social networking, cloud computing, mobile networking, etc, have sprung up at an astonishing pace. These technologies not only influence modern life styles but also impact Internet infrastructure. Numerous new network applications and services require better programmability and flexibility for network devices, such as routers and switches. Since traditional fixed function network routers based on application specific integrated circuits (ASICs) have difficulty keeping pace with the growing demands of next-generation Internet applications, there is an ongoing shift in the industry toward implementing network devices using programmable network processors (NPs).
While network processors offer great benefits in terms of flexibility, their reprogrammable nature exposes potential security risks. Similar to network end-systems, such as general-purpose computers, software-based network processors have security vulnerabilities that can be attacked remotely. Recent research has shown that a new type of data plane attack is able to modify the functionality of a network processor and cause a denial-of-service (DoS) attack by sending a single malformed UDP packet. Since this attack relies solely on data plane access and does not need access to the control plane, it can be particularly difficult to control.
Hardware security monitors have been introduced to identify and eliminate these malicious packets before they can propagate and cause devastating effects in the network. However, previous work on hardware monitors only focus on single core systems with static (or very slowly changing) workloads. In network processors that use up to hundreds of parallel processor cores and have processing workloads that can change dynamically based on the network traffic, the realization of a complete multicore hardware monitoring system remains a critical challenge. Our research work in this thesis provides a comprehensive solution to this problem.
Our first contribution is the design and prototype implementation of a Scalable Hardware Monitoring Grid (SHMG). This scalable architecture balances area cost and performance overhead by using a clustered approach for multicore NP systems. In order to adapt to dynamically changing network traffic, a resource reallocation algorithm is designed to reassign the processing resources in SHMG to different network applications at runtime. An evaluation of the prototype SHMG on an Altera DE4 board demonstrates low resource and performance overheads. The functionality and performance of a runtime resource reallocation algorithm are tested using a simulation environment.
A second significant contribution of this work is a network system-level security solution for multicore network processors with hardware monitors. It addresses two key problems: (1) how to securely manage and reprogram processor cores and monitors in a deployed router in the network, and (2) how to prevent the large number of identical router devices in the network from an attack that can circumvent one specific monitoring system and lead to Internet-scale failures. A Secure Dynamic Multicore Hardware Monitoring System (SDMMon) is designed based on cryptographic principles and suitable key management to ensure the secure installation of processor binaries and monitor graphs. We present a Merkle tree based parameterizable high performance hash function that can be configured to perform a variety of functions in different devices via a 32-bit configuration parameter. A prototype system composed of both the SDMMon and the parameterizable hash is implemented and evaluated on an Altera DE4 board.
Finally, a fully-functional, comprehensive Multicore NP Security Platform, which integrates both the SHMG and the SDMMon security features, has been implemented on an Altera DE5 board
Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip
The sustained demand for faster, more powerful chips has been met by the
availability of chip manufacturing processes allowing for the integration of increasing
numbers of computation units onto a single die. The resulting outcome,
especially in the embedded domain, has often been called SYSTEM-ON-CHIP
(SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC).
MPSoC design brings to the foreground a large number of challenges, one of
the most prominent of which is the design of the chip interconnection. With a
number of on-chip blocks presently ranging in the tens, and quickly approaching
the hundreds, the novel issue of how to best provide on-chip communication
resources is clearly felt.
NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable
answer to this design concern. By bringing large-scale networking concepts to
the on-chip domain, they guarantee a structured answer to present and future
communication requirements. The point-to-point connection and packet switching
paradigms they involve are also of great help in minimizing wiring overhead
and physical routing issues. However, as with any technology of recent inception,
NoC design is still an evolving discipline. Several main areas of interest
require deep investigation for NoCs to become viable solutions:
• The design of the NoC architecture needs to strike the best tradeoff among
performance, features and the tight area and power constraints of the onchip
domain.
• Simulation and verification infrastructure must be put in place to explore,
validate and optimize the NoC performance.
• NoCs offer a huge design space, thanks to their extreme customizability in
terms of topology and architectural parameters. Design tools are needed
to prune this space and pick the best solutions.
• Even more so given their global, distributed nature, it is essential to evaluate
the physical implementation of NoCs to evaluate their suitability for
next-generation designs and their area and power costs.
This dissertation performs a design space exploration of network-on-chip architectures,
in order to point-out the trade-offs associated with the design of
each individual network building blocks and with the design of network topology
overall. The design space exploration is preceded by a comparative analysis
of state-of-the-art interconnect fabrics with themselves and with early networkon-
chip prototypes. The ultimate objective is to point out the key advantages
that NoC realizations provide with respect to state-of-the-art communication
infrastructures and to point out the challenges that lie ahead in order to make
this new interconnect technology come true. Among these latter, technologyrelated
challenges are emerging that call for dedicated design techniques at all
levels of the design hierarchy. In particular, leakage power dissipation, containment
of process variations and of their effects. The achievement of the above
objectives was enabled by means of a NoC simulation environment for cycleaccurate
modelling and simulation and by means of a back-end facility for the
study of NoC physical implementation effects. Overall, all the results provided
by this work have been validated on actual silicon layout
On the Energy Efficiency and Performance of Irregular Application Executions on Multicore, NUMA and Manycore Platforms
International audienceUntil the last decade, performance of HPC architectures has been almost exclusively quantifiedby their processing power. However, energy efficiency is being recently considered as importantas raw performance and has become a critical aspect to the development of scalablesystems. These strict energy constraints guided the development of a new class of so-calledlight-weight manycore processors. This study evaluates the computing and energy performanceof two well-known irregular NP-hard problems — the Traveling-Salesman Problem (TSP) andK-Means clustering—and a numerical seismic wave propagation simulation kernel—Ondes3D—on multicore, NUMA, and manycore platforms. First, we concentrate on the nontrivial task ofadapting these applications to a manycore, specifically the novel MPPA-256 manycore processor.Then, we analyze their performance and energy consumption on those di↵erent machines.Our results show that applications able to fully use the resources of a manycore can have betterperformance and may consume from 3.8x to 13x less energy when compared to low-power andgeneral-purpose multicore processors, respectivel
Analysis and optimization of a debug post-silicon hardware architecture
The goal of this thesis is to analyze the post-silicon validation hardware infrastructure implemented on multicore systems taking as an example Esperanto Technologies SoC, which has thousands of RISC-V processors and targets specific software applications. Then, based on the conclusions of the analysis, the project proposes a new post-silicon debug architecture that can fit on any System on-Chip without depending on its target application or complexity and that optimizes the options available on the market for multicore systems
Experimental Evaluation and Comparison of Time-Multiplexed Multi-FPGA Routing Architectures
Emulating large complex designs require multi-FPGA systems (MFS). However, inter-FPGA communication is confronted by the challenge of lack of interconnect capacity due to limited number of FPGA input/output (I/O) pins. Serializing parallel signals onto a single trace effectively addresses the limited I/O pin obstacle. Besides the multiplexing scheme and multiplexing ratio (number of inter-FPGA signals per trace), the choice of the MFS routing architecture also affect the critical path latency. The routing architecture of an MFS is the interconnection pattern of FPGAs, fixed wires and/or programmable interconnect chips. Performance of existing MFS routing architectures is also limited by off-chip interface selection. In this dissertation we proposed novel 2D and 3D latency-optimized time-multiplexed MFS routing architectures. We used rigorous experimental approach and real sequential benchmark circuits to evaluate and compare the proposed and existing MFS routing architectures. This research provides a new insight into the encouraging effects of using off-chip optical interface and three dimensional MFS routing architectures. The vertical stacking results in shorter off-chip links improving the overall system frequency with the additional advantage of smaller footprint area. The proposed 3D architectures employed serialized interconnect between intra-plane and inter-plane FPGAs to address the pin limitation problem. Additionally, all off-chip links are replaced by optical fibers that exhibited latency improvement and resulted in faster MFS. Results indicated that exploiting third dimension provided latency and area improvements as compared to 2D MFS. We also proposed latency-optimized planar 2D MFS architectures in which electrical interconnections are replaced by optical interface in same spatial distribution. Performance evaluation and comparison showed that the proposed architectures have reduced critical path delay and system frequency improvement as compared to conventional MFS. We also experimentally evaluated and compared the system performance of three inter-FPGA communication schemes i.e. Logic Multiplexing, SERDES and MGT in conjunction with two routing architectures i.e. Completely Connected Graph (CCG) and TORUS. Experimental results showed that SERDES attained maximum frequency than the other two schemes. However, for very high multiplexing ratios, the performance of SERDES & MGT became comparable
A Scalable and Adaptive Network on Chip for Many-Core Architectures
In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced
Evaluation of advanced techniques for structural FPGA self-test
This thesis presents a comprehensive test generation framework for FPGA logic elements and interconnects. It is based on and extends the current state-of-the-art. The purpose of FPGA testing in this work is to achieve reliable reconfiguration for a FPGA-based runtime reconfigurable system. A pre-configuration test is performed on a portion of the FPGA before it is reconfigured as part of the system to ensure that the FPGA fabric is fault-free. The implementation platform is the Xilinx Virtex-5 FPGA family.
Existing literature in FPGA testing is evaluated and reviewed thoroughly. The various approaches are compared against one another qualitatively and the approach most suitable to the target platform is chosen. The array testing method is employed in testing the FPGA logic for its low hardware overhead and optimal test time. All tests are additionally pipelined to reduce test application time and use a high test clock frequency. A hybrid fault model including both structural and functional faults is assumed.
An algorithm for the optimization of the number of required FPGA test configurations is developed and implemented in Java using a pseudo-random set-covering heuristic. Optimal solutions are obtained for Virtex-5 logic slices. The algorithm effort is parameterizable with the number of loop iterations each of which take approximately one second for a Virtex-5 sliceL circuit.
A flexible test architecture for interconnects is developed. Arbitrary wire types can be tested in the same test configuration with no hardware overhead. Furthermore, a routing algorithm is integrated with the test template generation to select the wires under test and route them appropriately.
Nine test configurations are required to achieve full test coverage for the FPGA logic. For interconnect testing, a local router-based on depth-first graph traversal is implemented in Java as the basis for creating systematic interconnect test templates. Pent wire testing is additionally implemented as a proof of concept. The test clock frequency for all tests exceeds 170 MHz and the hardware overhead is always lower than seven CLBs. All implemented tests are parameterizable such that they can be applied to any portion of the FPGA regardless of size or position
Quarc: an architecture for efficient on-chip communication
The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems.
Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm.
By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission
commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence.
To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of
the building blocks of the architecture, including topology, router and network interface.
The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture.
Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication
- …