1,114 research outputs found

    A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

    Get PDF
    We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable. We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms

    Beltway Buffers: Avoiding the OS Traffic Jam

    Get PDF

    Near real-time network analysis for the identification of malicious activity

    Get PDF
    The evolution of technology and the increasing connectivity between devices lead to an increased risk of cyberattacks. Reliable protection systems, such as Intrusion Detection System (IDS) and Intrusion Prevention System (IPS), are essential to try to prevent, detect and counter most of the attacks. However, the increased creativity and type of attacks raise the need for more resources and processing power for the protection systems which, in turn, requires horizontal scalability to keep up with the massive companies’ network infrastructure and with the complexity of attacks. Technologies like machine learning, show promising results and can be of added value in the detection and prevention of attacks in near real-time. But good algorithms and tools are not enough. They require reliable and solid datasets to be able to effectively train the protection systems. The development of a good dataset requires horizontal-scalable, robust, modular and faulttolerant systems so that the analysis may be done in near real-time. This work describes an architecture design for horizontal-scaling capture, storage and analyses, able to collect packets from multiple sources and analyse them in a parallel fashion. The system depends on multiple modular nodes with specific roles to support different algorithms and tools.A evolução da tecnologia e o aumento da conectividade entre dispositivos, levam a um aumento do risco de ciberataques. Os sistemas de deteção de intrusão são essenciais para tentar prevenir, detetar e conter a maioria dos ataques. No entanto, o aumento da criatividade e do tipo de ataques aumenta a necessidade dos sistemas de proteção possuírem cada vez mais recursos e poder computacional. Por sua vez, requerem escalabilidade horizontal para acompanhar a massiva infraestrutura de rede das empresas e a complexidade dos ataques. Tecnologias como machine learning apresentam resultados promissores e podem ser de grande valor na deteção e prevenção de ataques em tempo útil. No entanto, a utilização dos algoritmos e ferramentas requer sempre um conjunto de dados sólidos e confiáveis para treinar os sistemas de proteção de maneira eficaz. A implementação de um bom conjunto de dados requer sistemas horizontalmente escaláveis, robustos, modulares e tolerantes a falhas para que a análise seja rápida e rigorosa. Este trabalho descreve a arquitetura de um sistema de captura, armazenamento e análise, capaz de capturar pacotes de múltiplas fontes e analisá-los de forma paralela. O sistema depende de vários nós modulares com funções específicas para oferecer suporte a diferentes algoritmos e ferramentas

    CSP for Executable Scientific Workflows

    Get PDF

    An efficient data exchange mechanism for chained network functions

    Get PDF
    Thanks to the increasing success of virtualization technologies and processing capabilities of computing devices, the deployment of virtual network functions is evolving towards a unified approach aiming at concentrating a huge amount of such functions within a limited number of commodity servers. To keep pace with this trend, a key issue to address is the definition of a secure and efficient way to move data between the different virtualized environments hosting the functions and a centralized component that builds the function chains within a single server. This paper proposes an efficient algorithm that realizes this vision and that, by exploiting the peculiarities of this application domain, is more efficient than classical solutions. The algorithm that manages the data exchanges is validated by performing a formal verification of its main safety and security properties, and an extensive functional and performance evaluation is presented

    Low-cost Guaranteed-Throughput dual-ring communication infrastructure for heterogeneous MPSoCs

    Get PDF
    Connection-oriented Guaranteed-Throughput (GT) mesh-based Networks on Chip (NoCs) have been proposed as a replacement for buses in real-time stream processing systems but are currently rarely used as hardware cost tends to be higher than conventional interconnects. Recently an interconnect with a ring topology was introduced as a low-cost alternative for use in medium scale homogeneous Multiple Processor System on Chip (MPSoC) designs. Cost-effective integration of stream processing accelerators would require an extension of this ring interconnect. We present a dual-ring communication infrastructure for heterogeneous MPSoC designs. Data and credits are transferred between tiles using their separate, oppositely directed, rings. The minimum throughput is determined by analysis of a Cyclo-Static Data Flow (CSDF) model for a system with communication between accelerators and processors. The performance benefits and costs are evaluated by integration of our dual ring and an accelerator in a 16 core MPSoC which is mapped on a Virtex6 FPGA. On this MPSoC a real-time PAL video decoder is executed. A performance gain of a factor 3.6 was obtained at an increase in hardware cost of only 8.5%

    RingScalar: A Complexity-Effective Out-of-Order Superscalar Microarchitecture

    Get PDF
    RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that reduces the area, latency, and power of all major structures in the instruction flow. The design divides an N-way superscalar into N columns connected in a unidirectional ring, where each column contains a portion of the instruction window, a bank of the register file, and an ALU. The design exploits the fact that most decoded instructions are waiting on just one operand to use only a single tag per issue window entry, and to restrict instruction wakeup and value bypass to only communicate with the neighboring column. Detailed simulations of four-issue single-threaded machines running SPECint2000 show that RingScalar has IPC only 13% lower than an idealized superscalar, while providing large reductions in area, power, and circuit latency

    LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed

    Full text link
    Running off-site software middleboxes at third-party service providers has been a popular practice. However, routing large volumes of raw traffic, which may carry sensitive information, to a remote site for processing raises severe security concerns. Prior solutions often abstract away important factors pertinent to real-world deployment. In particular, they overlook the significance of metadata protection and stateful processing. Unprotected traffic metadata like low-level headers, size and count, can be exploited to learn supposedly encrypted application contents. Meanwhile, tracking the states of 100,000s of flows concurrently is often indispensable in production-level middleboxes deployed at real networks. We present LightBox, the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox is the product of our systematic investigation of how to overcome the inherent limitations of secure enclaves using domain knowledge and customization. First, we introduce an elegant virtual network interface that allows convenient access to fully protected packets at line rate without leaving the enclave, as if from the trusted source network. Second, we provide complete flow state management for efficient stateful processing, by tailoring a set of data structures and algorithms optimized for the highly constrained enclave space. Extensive evaluations demonstrate that LightBox, with all security benefits, can achieve 10Gbps packet I/O, and that with case studies on three stateful middleboxes, it can operate at near-native speed.Comment: Accepted at ACM CCS 201

    Parallel network protocol stacks using replication

    Get PDF
    Computing applications demand good performance from networking systems. This includes high-bandwidth communication using protocols with sophisticated features such as ordering, reliability, and congestion control. Much of this protocol processing occurs in software, both on desktop systems and servers. Multi-processing is a requirement on today\u27s computer architectures because their design does not allow for increased processor frequencies. At the same time, network bandwidths continue to increase. In order to meet application demand for throughput, protocol processing must be parallel to leverage the full capabilities of multi-processor or multi-core systems. Existing parallelization strategies have performance difficulties that limit their scalability and their application to single, high-speed data streams. This dissertation introduces a new approach to parallelizing network protocol processing without the need for locks or for global state. Rather than maintain global states, each processor maintains its own copy of protocol state. Therefore, updates are local and don\u27t require fine-grained locks or explicit synchronization. State management work is replicated, but logically independent work is parallelized. Along with the approach, this dissertation describes Dominoes, a new framework for implementing replicated processing systems. Dominoes organizes the state information into Domains and the communication into Channels. These two abstractions provide a powerful, but flexible model for testing the replication approach. This dissertation uses Dominoes to build a replicated network protocol system. The performance of common protocols, such as TCP/IP, is increased by multiprocessing single connections. On commodity hardware, throughput increases between 15-300% depending on the type of communication. Most gains are possible when communicating with unmodified peer implementations, such as Linux. In addition to quantitative results, protocol behavior is studied as it relates to the replication approach
    • …
    corecore