11 research outputs found
Optimizing energy-efficiency for multi-core packet processing systems in a compiler framework
Network applications become increasingly computation-intensive and the amount of traffic soars unprecedentedly nowadays. Multi-core and multi-threaded techniques are thus widely employed in packet processing system to meet the changing requirement. However, the processing power cannot be fully utilized without a suitable programming environment. The compilation procedure is decisive for the quality of the code. It can largely determine the overall system performance in terms of packet throughput, individual packet latency, core utilization and energy efficiency.
The thesis investigated compilation issues in networking domain first, particularly on energy consumption. And as a cornerstone for any compiler optimizations, a code analysis module for collecting program dependency is presented and incorporated into a compiler framework. With that dependency information, a strategy based on graph bi-partitioning and mapping is proposed to search for an optimal configuration in a parallel-pipeline fashion. The energy-aware extension is specifically effective in enhancing the energy-efficiency of the whole system. Finally, a generic evaluation framework for simulating the performance and energy consumption of a packet processing system is given. It accepts flexible architectural configuration and is capable of performingarbitrary code mapping. The simulation time is extremely short compared to full-fledged simulators. A set of our optimization results is gathered using the framework
Recommended from our members
Synthesis and Optimization of Pipelined Packet Processors
We consider pipelined architectures of packet processors consisting of a sequence of simple packet-processing modules interconnected by first-in first-out buffers. We propose a new model for describing their function, an automated synthesis technique that generates efficient hardware for them, and an algorithm for computing minimum buffer sizes that allow such pipelines to achieve their maximum throughput. Our functional model provides a level of abstraction familiar to a network protocol designer; in particular, it does not require knowledge of register-transfer-level hardware design. Our synthesis tool implements the specified function in a sequential circuit that processes packet data a word at a time. Finally, our analysis technique computes the maximum throughput possible from the modules and then determines the smallest buffers that can achieve it. Experimental results conducted on industrial-strength examples suggest that our techniques are practical. Our synthesis algorithm can generate circuits that achieve 40 Gb/s on field-programmable gate arrays, equal to state-of-the-art manual implementations, and our buffer-sizing algorithm has a practically short runtime. Together, our techniques make it easier to quickly develop and deploy high-speed network switches
Understanding retargeting compilation techniques for network processors
Mémoire numérisé par la Direction des bibliothÚques de l'Université de Montréal
React++: A Lightweight Actor Framework in C++
Distributed software remains susceptible to data races and poor scalability because of the widespread use of locks and other low-level synchronization primitives. Furthermore, using this programming approach is known to break encapsulation offered by object-oriented programming. Actors present an alternative model of concurrent computation by serving as building blocks with a higher level of abstraction. They encapsulate concurrent logic in their behaviors and rely only on asynchronous exchange of messages for synchronization, preventing a broad range of concurrent issues by eschewing locks. Existing actor frameworks often seem to focus on CPU-bound workloads and lack an actor-oriented I/O infrastructure. The purpose of this thesis is to investigate the scalability of user-space I/O operations carried out by actors. It presents an experimental actor framework named React++, with an M:N runtime for cooperative scheduling of actors and an integrated I/O subsystem. Load distribution is policy-driven and uses a variant of the randomized work-stealing algorithm. The evaluation of the framework is carried out in three stages. First, the efficiency of message delivery, scheduling and load balancing is assessed by a set of micro-benchmarks, where React++ retains a competitive score against several well-known actor frameworks. Next, a web server built on React++ is shown to be on par with its fastest event-driven counterparts in the TechEmpower plaintext benchmark. Finally, the runtime of an existing messaging library (ZeroMQ) is augmented with React++, replacing the backend and delegating all network I/O to actors without incurring any substantial overhead
Parallel and distributed processing in high speed traffic monitoring
This thesis presents a parallel and distributed approach for the purpose of processing network traffic at high speeds. The proposed architecture provides the processing power required to run one or more traffic processing applications at line rates by means of processing full packets at multi-gigabits speeds using a parallel and distributed processing environment. Moreover, the architecture is flexible and scalable to future needs by supporting heterogeneous processing nodes such as different hardware architectures or different generations of the same hardware architecture. In addition to the processing, flexibility, and scalability features, our architecture provides an easy-to-use environment with the help of a new programming language, called FPL, for traffic processing in a distributed environment. The language and its compiler come to hide specific programming details when using heterogeneous systems and a distributed environment.UBL - phd migration 201
Online learning on the programmable dataplane
This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observationsâand argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the networkâ runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network.
To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasibleâto port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms
Automatic Data Partitioning for the Agere Payload Plus Network Processor
With the ever-increasing pervasiveness of the Internet and its stringent performance requirements, network system designers have begun utilizing specialized chips to increase the performance of network functions. To increase performance, many more advanced functions, such as tra#c shaping and policing, are being implemented at the network interface layer to reduce delays that occur when these functions are handled by a general-purpose CPU. While some designs use ASICs to handle network functions, many system designers have moved toward using programmable network processors due to their increased flexibility and lower design cost
Programming Languages and Systems
This open access book constitutes the proceedings of the 28th European Symposium on Programming, ESOP 2019, which took place in Prague, Czech Republic, in April 2019, held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019