14 research outputs found

    Inference of virtual network functions' state via analysis of the CPU behavior

    Get PDF
    The on-going process of softwarization of IT networks promises to reduce the operational and management costs of network infrastructures by replacing hardware middleboxes with equivalent pieces of code executed on general-purpose servers. Alongside the benefits from the operator’s perspective, new strategies to provide the network’s resources to users are arising. Following the principle of “everything as a service”, multiple tenants can access the required resources – typically CPUs, NICs, or RAM – according to a Service-Level Agreement. However, tenants’ applications may require a complex and expensive measurement infrastructure to continuously monitor the network function’s state. Although the application’s specific behavior is unknown (and often opaque to the infrastructure owner), the software nature of (virtual) network functions (VNFs) may be the key to infer the behavior of the high-level functions by accessing low-level information, which is still under the control of the operating system and therefore of the infrastructure owner. As such, in the scenario of software VNFs executed on COTS servers, the underlying CPU’s behavior can be used as the sole predictor for the high-level VNF state without explicit in-network measurements: in this paper, we develop a novel methodology to infer high-level characteristics such as throughput or packet loss using CPU data instead of network measurements. Our methodology consists of (i) experimentally analyzing the behavior of a CPU that executes a VNF under different loads, (ii) extracting a correlation between the CPU footprint and the highlevel application state, and (iii) use this knowledge to detect the previously mentioned network metrics. Our code and datasets are publicly available

    Performance Benchmarking of State-of-the-Art Software Switches for NFV

    Full text link
    With the ultimate goal of replacing proprietary hardware appliances with Virtual Network Functions (VNFs) implemented in software, Network Function Virtualization (NFV) has been gaining popularity in the past few years. Software switches route traffic between VNFs and physical Network Interface Cards (NICs). It is of paramount importance to compare the performance of different switch designs and architectures. In this paper, we propose a methodology to compare fairly and comprehensively the performance of software switches. We first explore the design spaces of seven state-of-the-art software switches and then compare their performance under four representative test scenarios. Each scenario corresponds to a specific case of routing NFV traffic between NICs and/or VNFs. In our experiments, we evaluate the throughput and latency between VNFs in two of the most popular virtualization environments, namely virtual machines (VMs) and containers. Our experimental results show that no single software switch prevails in all scenarios. It is, therefore, crucial to choose the most suitable solution for the given use case. At the same time, the presented results and analysis provide a deeper insight into the design tradeoffs and identifies potential performance bottlenecks that could inspire new designs.Comment: 17 page

    Hardware-Aware Algorithm Designs for Efficient Parallel and Distributed Processing

    Get PDF
    The introduction and widespread adoption of the Internet of Things, together with emerging new industrial applications, bring new requirements in data processing. Specifically, the need for timely processing of data that arrives at high rates creates a challenge for the traditional cloud computing paradigm, where data collected at various sources is sent to the cloud for processing. As an approach to this challenge, processing algorithms and infrastructure are distributed from the cloud to multiple tiers of computing, closer to the sources of data. This creates a wide range of devices for algorithms to be deployed on and software designs to adapt to.In this thesis, we investigate how hardware-aware algorithm designs on a variety of platforms lead to algorithm implementations that efficiently utilize the underlying resources. We design, implement and evaluate new techniques for representative applications that involve the whole spectrum of devices, from resource-constrained sensors in the field, to highly parallel servers. At each tier of processing capability, we identify key architectural features that are relevant for applications and propose designs that make use of these features to achieve high-rate, timely and energy-efficient processing.In the first part of the thesis, we focus on high-end servers and utilize two main approaches to achieve high throughput processing: vectorization and thread parallelism. We employ vectorization for the case of pattern matching algorithms used in security applications. We show that re-thinking the design of algorithms to better utilize the resources available in the platforms they are deployed on, such as vector processing units, can bring significant speedups in processing throughout. We then show how thread-aware data distribution and proper inter-thread synchronization allow scalability, especially for the problem of high-rate network traffic monitoring. We design a parallelization scheme for sketch-based algorithms that summarize traffic information, which allows them to handle incoming data at high rates and be able to answer queries on that data efficiently, without overheads.In the second part of the thesis, we target the intermediate tier of computing devices and focus on the typical examples of hardware that is found there. We show how single-board computers with embedded accelerators can be used to handle the computationally heavy part of applications and showcase it specifically for pattern matching for security-related processing. We further identify key hardware features that affect the performance of pattern matching algorithms on such devices, present a co-evaluation framework to compare algorithms, and design a new algorithm that efficiently utilizes the hardware features.In the last part of the thesis, we shift the focus to the low-power, resource-constrained tier of processing devices. We target wireless sensor networks and study distributed data processing algorithms where the processing happens on the same devices that generate the data. Specifically, we focus on a continuous monitoring algorithm (geometric monitoring) that aims to minimize communication between nodes. By deploying that algorithm in action, under realistic environments, we demonstrate that the interplay between the network protocol and the application plays an important role in this layer of devices. Based on that observation, we co-design a continuous monitoring application with a modern network stack and augment it further with an in-network aggregation technique. In this way, we show that awareness of the underlying network stack is important to realize the full potential of the continuous monitoring algorithm.The techniques and solutions presented in this thesis contribute to better utilization of hardware characteristics, across a wide spectrum of platforms. We employ these techniques on problems that are representative examples of current and upcoming applications and contribute with an outlook of emerging possibilities that can build on the results of the thesis

    Ultra-reliable Low-latency, Energy-efficient and Computing-centric Software Data Plane for Network Softwarization

    Get PDF
    Network softwarization plays a significantly important role in the development and deployment of the latest communication system for 5G and beyond. A more flexible and intelligent network architecture can be enabled to provide support for agile network management, rapid launch of innovative network services with much reduction in Capital Expense (CAPEX) and Operating Expense (OPEX). Despite these benefits, 5G system also raises unprecedented challenges as emerging machine-to-machine and human-to-machine communication use cases require Ultra-Reliable Low Latency Communication (URLLC). According to empirical measurements performed by the author of this dissertation on a practical testbed, State of the Art (STOA) technologies and systems are not able to achieve the one millisecond end-to-end latency requirement of the 5G standard on Commercial Off-The-Shelf (COTS) servers. This dissertation performs a comprehensive introduction to three innovative approaches that can be used to improve different aspects of the current software-driven network data plane. All three approaches are carefully designed, professionally implemented and rigorously evaluated. According to the measurement results, these novel approaches put forward the research in the design and implementation of ultra-reliable low-latency, energy-efficient and computing-first software data plane for 5G communication system and beyond

    Private 5G and its Suitability for Industrial Networking

    Get PDF
    5G was and is still surrounded by many promises and buzzwords, such as the famous 1 ms, real-time, and Ultra-Reliable and Low-Latency Communications (URLLC). This was partly intended to get the attention of vertical industries to become new customers for mobile networks, which shall be deployed in their factories. With the allowance of federal agencies, companies deployed their own private 5G networks to test new use cases enabled by 5G. But what has been missing, apart from all the marketing, is the knowledge of what 5G can really do? Private 5G networks are envisioned to enable new use cases with strict latency requirements, such as robot control. This work has examined in great detail the capabilities of the current 5G Release 15 as private network, and in particular its suitability with regard to time-critical communications. For that, a testbed was designed to measure One-Way Delays (OWDs) and Round-Trip Times (RTTs) with high accuracy. The measurements were conducted in 5G Non-Standalone (NSA) and Standalone (SA) net-works and are the first published results. The evaluation revealed results that were not obvious or identified by previous work. For example, a strong impact of the packet rate on the resulting OWD and RTT was found. It was also found that typically 95% of the SA downlink end-to-end packet delays are in the range of 4 ms to 10 ms, indicating a fairly wide spread of packet delays, with the Inter-Packet Delay Variation (IPDV) between consecutive packets distributed in the millisecond range. Surprisingly, it also seems to matter for the RTT from which direction, i.e. Downlink (DL) or Uplink (UL), a round-trip communication was initiated. Another important factor plays especially the Inter-Arrival Time (IAT) of packets on the RTT distribution. These examples from the results found demonstrate the need to critically examine 5G and any successors in terms of their real-time capabilities. In addition to the end-to-end OWD and RTT, the delays caused by 4G and 5G Core processing has been investigated as well. Current state-of-the-art 4G and 5G Core implementations exhibit long-tailed delay distributions. To overcome such limitations, modern packet processing have been evaluated in terms of their respective tail-latency. The hardware-based solution was able to process packets with deterministic delay, but the software-based solutions also achieved soft real-time results. These results allow the selection of the right technology for use cases depending on their tail-latency requirements. In summary, many insights into the suitability of 5G for time-critical communications were gained from the study of the current 5G Release 15. The measurement framework, analysis methods, and results will inform the further development and refinement of private 5G campus networks for industrial use cases

    Cyber Security

    Get PDF
    This open access book constitutes the refereed proceedings of the 18th China Annual Conference on Cyber Security, CNCERT 2022, held in Beijing, China, in August 2022. The 17 papers presented were carefully reviewed and selected from 64 submissions. The papers are organized according to the following topical sections: ​​data security; anomaly detection; cryptocurrency; information security; vulnerabilities; mobile internet; threat intelligence; text recognition

    Cyber Security

    Get PDF
    This open access book constitutes the refereed proceedings of the 18th China Annual Conference on Cyber Security, CNCERT 2022, held in Beijing, China, in August 2022. The 17 papers presented were carefully reviewed and selected from 64 submissions. The papers are organized according to the following topical sections: ​​data security; anomaly detection; cryptocurrency; information security; vulnerabilities; mobile internet; threat intelligence; text recognition

    Efficient Cache Attacks on AES, and Countermeasures

    Get PDF
    We describe several software side-channel attacks based on inter-process leakage through the state of the CPU's memory cache. This leakage reveals memory access patterns, which can be used for cryptanalysis of cryptographic primitives that employ data-dependent table lookups. The attacks allow an unprivileged process to attack other processes running in parallel on the same processor, despite partitioning methods such as memory protection, sandboxing, and virtualization. Some of our methods require only the ability to trigger services that perform encryption or MAC using the unknown key, such as encrypted disk partitions or secure network links. Moreover, we demonstrate an extremely strong type of attack, which requires knowledge of neither the specific plaintexts nor ciphertexts and works by merely monitoring the effect of the cryptographic process on the cache. We discuss in detail several attacks on AES and experimentally demonstrate their applicability to real systems, such as OpenSSL and Linux's dm-crypt encrypted partitions (in the latter case, the full key was recovered after just 800 writes to the partition, taking 65 milliseconds). Finally, we discuss a variety of countermeasures which can be used to mitigate such attacks

    A novel access pattern-based multi-core memory architecture

    Get PDF
    Increasingly High-Performance Computing (HPC) applications run on heterogeneous multi-core platforms. The basic reason of the growing popularity of these architectures is their low power consumption, and high throughput oriented nature. However, this throughput imposes a requirement on the data to be supplied in a high throughput manner for the multi-core system. This results in the necessity of an efficient management of on-chip and off-chip memory data transfers, which is a significant challenge. Complex regular and irregular memory data transfer patterns are becoming widely dominant for a range of application domains including the scientific, image and signal processing. Data accesses can be arranged in independent patterns that an efficient memory management can exploit. The software based approaches using general purpose caches and on-chip memories are beneficial to some extent. However, the task of efficient data management for the throughput oriented devices could be improved by providing hardware mechanisms that exploit the knowledge of access patterns in memory management and scheduling of accesses for a heterogeneous multi-core architecture. The focus of this thesis is to present architectural explorations for a novel access pattern-based multi-core memory architecture. In general, the thesis covers four main aspects of memory system in this research. These aspects can be categorized as: i) Uni-core Memory System for Regular Data Pattern. ii) Multi-core Memory System for Regular Data Pattern. iii) Uni-core Memory System for Irregular Data Pattern. and iv) Multi-core Memory System for Irregular Data Pattern.Les aplicacions de computació d'alt rendiment (HPC) s'executen cada vegada més en plataformes heterogènies de múltiples nuclis. El motiu bàsic de la creixent popularitat d'aquestes arquitectures és el seu baix consum i la seva natura orientada a alt throughput. No obstant, aquest thoughput imposa el requeriment de que les dades es proporcionin al sistema també amb alt throughput. Això resulta en la necessitat de gestionar eficientment les trasferències de memòria (dins i fora del chip), un repte significatiu. Els patrons de transferències de memòria regulars però complexos així com els irregulars són cada vegada més dominants per a diversos dominis d'aplicacions, incloent el científic i el processat d'imagte i senyals. Aquests accessos a dades poden ser organitzats en patrons independents que un gestor de memòria eficient pot explotar. Els mètodes basats en programari emprant memòries cau de propòsit general i memòries al chip són beneficioses fins a cert punt. No obstant, la tasca de gestionar eficientment les transferències de dades per a dispositius orientats a throughput pot ser millorada oferint mecanismes hardware que explotin el coneixement dels patrons d'accés de les aplicacions, així com la planificació dels accessos a una arquitectura de múltiples nuclis. Aquesta tesis està enfocada a explorar una arquitectura de memòria novedosa per a processadors de múltiples nuclis, basada en els patrons d'accés. En general, la recerca de la tesis cobreix quatres aspectes principals del sistema de memòria. Aquests aspectes són: i) sistema de memòria per a un únic nucli amb patrons regulars, ii) sistema de memòria per a múltiples nuclis amb patrons regulars, iii) sistema de memòria per a un únic nucli amb patrons irregulars, iv) sistema de memòria per a múltiples nuclis amb patrons irregulars
    corecore