4,205 research outputs found
Mind the Gap: A Comparison of Software Packet Generators
Network research relies on packet generators to assess performance and correctness of new ideas. Software-based generators in particular are widely used by academic researchers because of their flexibility, affordability, and open-source nature. The rise of new frameworks for fast IO on commodity hardware is making them even more attractive. Longstanding performance differences of software generation versus hardware in terms of throughput are no longer as big of a concern as they used to be few years ago.
This paper investigates the properties of several high-performance software packet generators and the implications on their precision when a given traffic pattern needs to be generated. We believe that the evaluation strategy presented in this paper helps understanding the actual limitations in high-performance software packet generation, thus helping the research community to build better tools
ANCHOR: logically-centralized security for Software-Defined Networks
While the centralization of SDN brought advantages such as a faster pace of
innovation, it also disrupted some of the natural defenses of traditional
architectures against different threats. The literature on SDN has mostly been
concerned with the functional side, despite some specific works concerning
non-functional properties like 'security' or 'dependability'. Though addressing
the latter in an ad-hoc, piecemeal way, may work, it will most likely lead to
efficiency and effectiveness problems. We claim that the enforcement of
non-functional properties as a pillar of SDN robustness calls for a systemic
approach. As a general concept, we propose ANCHOR, a subsystem architecture
that promotes the logical centralization of non-functional properties. To show
the effectiveness of the concept, we focus on 'security' in this paper: we
identify the current security gaps in SDNs and we populate the architecture
middleware with the appropriate security mechanisms, in a global and consistent
manner. Essential security mechanisms provided by anchor include reliable
entropy and resilient pseudo-random generators, and protocols for secure
registration and association of SDN devices. We claim and justify in the paper
that centralizing such mechanisms is key for their effectiveness, by allowing
us to: define and enforce global policies for those properties; reduce the
complexity of controllers and forwarding devices; ensure higher levels of
robustness for critical services; foster interoperability of the non-functional
property enforcement mechanisms; and promote the security and resilience of the
architecture itself. We discuss design and implementation aspects, and we prove
and evaluate our algorithms and mechanisms, including the formalisation of the
main protocols and the verification of their core security properties using the
Tamarin prover.Comment: 42 pages, 4 figures, 3 tables, 5 algorithms, 139 reference
Scalable parallel communications
Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups
A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials
We introduce PVSC-DTM (Parallel Vectorized Stencil Code for Dirac and
Topological Materials), a library and code generator based on a domain-specific
language tailored to implement the specific stencil-like algorithms that can
describe Dirac and topological materials such as graphene and topological
insulators in a matrix-free way. The generated hybrid-parallel (MPI+OpenMP)
code is fully vectorized using Single Instruction Multiple Data (SIMD)
extensions. It is significantly faster than matrix-based approaches on the node
level and performs in accordance with the roofline model. We demonstrate the
chip-level performance and distributed-memory scalability of basic building
blocks such as sparse matrix-(multiple-) vector multiplication on modern
multicore CPUs. As an application example, we use the PVSC-DTM scheme to (i)
explore the scattering of a Dirac wave on an array of gate-defined quantum
dots, to (ii) calculate a bunch of interior eigenvalues for strong topological
insulators, and to (iii) discuss the photoemission spectra of a disordered Weyl
semimetal.Comment: 16 pages, 2 tables, 11 figure
Packet Transactions: High-level Programming for Line-Rate Switches
Many algorithms for congestion control, scheduling, network measurement,
active queue management, security, and load balancing require custom processing
of packets as they traverse the data plane of a network switch. To run at line
rate, these data-plane algorithms must be in hardware. With today's switch
hardware, algorithms cannot be changed, nor new algorithms installed, after a
switch has been built.
This paper shows how to program data-plane algorithms in a high-level
language and compile those programs into low-level microcode that can run on
emerging programmable line-rate switching chipsets. The key challenge is that
these algorithms create and modify algorithmic state. The key idea to achieve
line-rate programmability for stateful algorithms is the notion of a packet
transaction : a sequential code block that is atomic and isolated from other
such code blocks. We have developed this idea in Domino, a C-like imperative
language to express data-plane algorithms. We show with many examples that
Domino provides a convenient and natural way to express sophisticated
data-plane algorithms, and show that these algorithms can be run at line rate
with modest estimated die-area overhead.Comment: 16 page
Performance measurement methodology for integrated services networks
With the emergence of advanced integrated services networks, the need for effective
performance analysis techniques has become extremely important. Further
advancements in these networks can only be possible if the practical performance
issues of the existing networks are clearly understood. This thesis is concerned with
the design and development of a measurement system which has been implemented on
a large experimental network.
The measurement system is based on dedicated traffic generators which have been
designed and implemented on the Project Unison network. The Unison project is a
multisite networking experiment for conducting research into the interconnection and
interworking of local area network based multi-media application systems. The traffic
generators were first developed for the Cambridge Ring based Unison network. Once
their usefulness and effectiveness was proven, high performance traffic generators
using transputer technology were built for the Cambridge Fast Ring based Unison
network. The measurement system is capable of measuring the conventional
performance parameters such as throughput and packet delay, and is able to
characterise the operational performance of network bridging components under
various loading conditions. In particular, the measurement system has been used in a
'measure and tune' fashion in order to improve the performance of a complex bridging
device.
Accurate measurement of packet delay in wide area networks is a recognised problem.
The problem is associated with the synchronisation of the clocks between the distant
machines. A chronological timestamping technique has been introduced in which the
clocks are synchronised using a broadcast synchronisation technique. Rugby time
clock receivers have been interfaced to each generator for the purpose of
synchronisation.
In order to design network applications, an accurate knowledge of the expected
network performance under different loading conditions is essential. Using the
measurement system, this has been achieved by examining the network characteristics
at the network/user interface. Also, the generators are capable of emulating a variety
of application traffic which can be injected into the network along with the traffic
from real applications, thus enabling user oriented performance parameters to be
evaluated in a mixed traffic environment.
A number of performance measurement experiments have been conducted using the
measurement system. Experimental results obtained from the Unison network serve to
emphasise the power and effectiveness of the measurement methodology
FloWatcher-DPDK: lightweight line-rate flow-level monitoring in software
In the last few years, several software-based solutions have been proved to be very efficient for high-speed packet processing, traffic generation and monitoring, and can be considered valid alternatives to expensive and non-flexible hardware-based solutions. In our work, we first benchmark heterogeneous design choices for software-based packet monitoring systems in terms of achievable performance and required resources (i.e., the number of CPU cores). Building on this extensive analysis we design FloWatcher-DPDK, a DPDK-based high-speed software traffic monitor we provide to the community as an open source project. In a nutshell, FloWatcher-DPDK provides tunable fine-grained statistics at packet and flow levels. Experimental results demonstrate that FloWatcher-DPDK sustains per-flow statistics with 5-nines precision at high-speed (e.g., 14.88 Mpps) using a limited amount of resources. Finally, we showcase the usage of FloWatcher-DPDK by configuring it to analyze the performance of two open source prototypes for stateful flow-level end-host and in-network packet processing
- …