Search CORE

1,114 research outputs found

A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue

Author: Nikolaev Ruslan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Distributed Computing (DISC 2019)
Publication date: 01/01/2019
Field of study

We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable. We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Beltway Buffers: Avoiding the OS Traffic Jam

Author: H. Bos
W. de Bruijn
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

Near real-time network analysis for the identification of malicious activity

Author: Oliveira Rafael Cardoso de
Publication venue
Publication date: 01/01/2021
Field of study

The evolution of technology and the increasing connectivity between devices lead to an increased risk of cyberattacks. Reliable protection systems, such as Intrusion Detection System (IDS) and Intrusion Prevention System (IPS), are essential to try to prevent, detect and counter most of the attacks. However, the increased creativity and type of attacks raise the need for more resources and processing power for the protection systems which, in turn, requires horizontal scalability to keep up with the massive companies’ network infrastructure and with the complexity of attacks. Technologies like machine learning, show promising results and can be of added value in the detection and prevention of attacks in near real-time. But good algorithms and tools are not enough. They require reliable and solid datasets to be able to effectively train the protection systems. The development of a good dataset requires horizontal-scalable, robust, modular and faulttolerant systems so that the analysis may be done in near real-time. This work describes an architecture design for horizontal-scaling capture, storage and analyses, able to collect packets from multiple sources and analyse them in a parallel fashion. The system depends on multiple modular nodes with specific roles to support different algorithms and tools.A evolução da tecnologia e o aumento da conectividade entre dispositivos, levam a um aumento do risco de ciberataques. Os sistemas de deteção de intrusão são essenciais para tentar prevenir, detetar e conter a maioria dos ataques. No entanto, o aumento da criatividade e do tipo de ataques aumenta a necessidade dos sistemas de proteção possuírem cada vez mais recursos e poder computacional. Por sua vez, requerem escalabilidade horizontal para acompanhar a massiva infraestrutura de rede das empresas e a complexidade dos ataques. Tecnologias como machine learning apresentam resultados promissores e podem ser de grande valor na deteção e prevenção de ataques em tempo útil. No entanto, a utilização dos algoritmos e ferramentas requer sempre um conjunto de dados sólidos e confiáveis para treinar os sistemas de proteção de maneira eficaz. A implementação de um bom conjunto de dados requer sistemas horizontalmente escaláveis, robustos, modulares e tolerantes a falhas para que a análise seja rápida e rigorosa. Este trabalho descreve a arquitetura de um sistema de captura, armazenamento e análise, capaz de capturar pacotes de múltiplas fontes e analisá-los de forma paralela. O sistema depende de vários nós modulares com funções específicas para oferecer suporte a diferentes algoritmos e ferramentas

Biblioteca Digital do IPB

CSP for Executable Scientific Workflows

Author: Friborg Rune Møllegaard
Publication venue: University of Copenhagen
Publication date: 29/11/2011
Field of study

Copenhagen University Research Information System

An efficient data exchange mechanism for chained network functions

Author: BONAFIGLIA ROBERTO
CERRATO IVANO
MARCHETTO GUIDO
RISSO FULVIO GIOVANNI OTTAVIO
SISTO Riccardo
VIRGILIO MATTEO
Publication venue: 'Elsevier BV'
Publication date: 01/04/2018
Field of study

Thanks to the increasing success of virtualization technologies and processing capabilities of computing devices, the deployment of virtual network functions is evolving towards a unified approach aiming at concentrating a huge amount of such functions within a limited number of commodity servers. To keep pace with this trend, a key issue to address is the definition of a secure and efficient way to move data between the different virtualized environments hosting the functions and a centralized component that builds the function chains within a single server. This paper proposes an efficient algorithm that realizes this vision and that, by exploiting the peculiarities of this application domain, is more efficient than classical solutions. The algorithm that manages the data exchanges is validated by performing a formal verification of its main safety and security properties, and an extensive functional and performance evaluation is presented

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Low-cost Guaranteed-Throughput dual-ring communication infrastructure for heterogeneous MPSoCs

Author: Bekooij Marco Jan Gerrit
Dekens B.H.J.
Kurtin Philip Sebastian
Smit Gerardus Johannes Maria
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/10/2014
Field of study

Connection-oriented Guaranteed-Throughput (GT) mesh-based Networks on Chip (NoCs) have been proposed as a replacement for buses in real-time stream processing systems but are currently rarely used as hardware cost tends to be higher than conventional interconnects. Recently an interconnect with a ring topology was introduced as a low-cost alternative for use in medium scale homogeneous Multiple Processor System on Chip (MPSoC) designs. Cost-effective integration of stream processing accelerators would require an extension of this ring interconnect. We present a dual-ring communication infrastructure for heterogeneous MPSoC designs. Data and credits are transferred between tiles using their separate, oppositely directed, rings. The minimum throughput is determined by analysis of a Cyclo-Static Data Flow (CSDF) model for a system with communication between accelerators and processors. The performance benefits and costs are evaluated by integration of our dual ring and an accelerator in a 16 core MPSoC which is mapped on a Virtex6 FPGA. On this MPSoC a real-time PAL video decoder is executed. A performance gain of a factor 3.6 was obtained at an increase in hardware cost of only 8.5%

University of Twente Research Information

RingScalar: A Complexity-Effective Out-of-Order Superscalar Microarchitecture

Author: Asanovic Krste
Tseng Jessica H.
Publication venue
Publication date: 18/09/2006
Field of study

RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that reduces the area, latency, and power of all major structures in the instruction flow. The design divides an N-way superscalar into N columns connected in a unidirectional ring, where each column contains a portion of the instruction window, a bank of the register file, and an ALU. The design exploits the fact that most decoded instructions are waiting on just one operand to use only a single tag per issue window entry, and to restrict instruction wakeup and value bypass to only communicate with the neighboring column. Detailed simulations of four-issue single-threaded machines running SPECint2000 show that RingScalar has IPC only 13% lower than an idealized superscalar, while providing large reductions in area, power, and circuit latency

DSpace@MIT

LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed

Author: Anati Ittai
Arnautov Sergei
Baumann Andrew
Casado Martin
Choi Byungkwon
Conti Mauro
Costan Victor
Eisenbud Daniel E
Erlingsson Ulfar
Gruss Daniel
Heileman Gregory L
Hunt Tyler
Jamshed Muhammad Asim
Kablan Murad
Khalid Junaid
Lan Chang
Lind Joshua
Mishra P.
Panda Aurojit
Poddar Rishabh
Porter Donald E
Rizzo Luigi
Scott
Sekar Vyas
Walfish Michael
Wang Tao
Wang Tao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Running off-site software middleboxes at third-party service providers has been a popular practice. However, routing large volumes of raw traffic, which may carry sensitive information, to a remote site for processing raises severe security concerns. Prior solutions often abstract away important factors pertinent to real-world deployment. In particular, they overlook the significance of metadata protection and stateful processing. Unprotected traffic metadata like low-level headers, size and count, can be exploited to learn supposedly encrypted application contents. Meanwhile, tracking the states of 100,000s of flows concurrently is often indispensable in production-level middleboxes deployed at real networks. We present LightBox, the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox is the product of our systematic investigation of how to overcome the inherent limitations of secure enclaves using domain knowledge and customization. First, we introduce an elegant virtual network interface that allows convenient access to fully protected packets at line rate without leaving the enclave, as if from the trusted source network. Second, we provide complete flow state management for efficient stateful processing, by tailoring a set of data structures and algorithms optimized for the highly constrained enclave space. Extensive evaluations demonstrate that LightBox, with all security benefits, can achieve 10Gbps packet I/O, and that with case studies on three stateful middleboxes, it can operate at near-native speed.Comment: Accepted at ACM CCS 201

arXiv.org e-Print Archive

Crossref

Monash University Research Portal

Parallel network protocol stacks using replication

Author: Sizemore Charles Donour
Publication venue: UNM Digital Repository
Publication date: 01/07/2011
Field of study

Computing applications demand good performance from networking systems. This includes high-bandwidth communication using protocols with sophisticated features such as ordering, reliability, and congestion control. Much of this protocol processing occurs in software, both on desktop systems and servers. Multi-processing is a requirement on today\u27s computer architectures because their design does not allow for increased processor frequencies. At the same time, network bandwidths continue to increase. In order to meet application demand for throughput, protocol processing must be parallel to leverage the full capabilities of multi-processor or multi-core systems. Existing parallelization strategies have performance difficulties that limit their scalability and their application to single, high-speed data streams. This dissertation introduces a new approach to parallelizing network protocol processing without the need for locks or for global state. Rather than maintain global states, each processor maintains its own copy of protocol state. Therefore, updates are local and don\u27t require fine-grained locks or explicit synchronization. State management work is replicated, but logically independent work is parallelized. Along with the approach, this dissertation describes Dominoes, a new framework for implementing replicated processing systems. Dominoes organizes the state information into Domains and the communication into Channels. These two abstractions provide a powerful, but flexible model for testing the replication approach. This dissertation uses Dominoes to build a replicated network protocol system. The performance of common protocols, such as TCP/IP, is increased by multiprocessing single connections. On commodity hardware, throughput increases between 15-300% depending on the type of communication. Most gains are possible when communicating with unmodified peer implementations, such as Linux. In addition to quantitative results, protocol behavior is studied as it relates to the replication approach