11 research outputs found

    Faster Compression of Deterministic Finite Automata

    Full text link
    Deterministic finite automata (DFA) are a classic tool for high throughput matching of regular expressions, both in theory and practice. Due to their high space consumption, extensive research has been devoted to compressed representations of DFAs that still support efficient pattern matching queries. Kumar~et~al.~[SIGCOMM 2006] introduced the \emph{delayed deterministic finite automaton} (\ddfa{}) which exploits the large redundancy between inter-state transitions in the automaton. They showed it to obtain up to two orders of magnitude compression of real-world DFAs, and their work formed the basis of numerous subsequent results. Their algorithm, as well as later algorithms based on their idea, have an inherent quadratic-time bottleneck, as they consider every pair of states to compute the optimal compression. In this work we present a simple, general framework based on locality-sensitive hashing for speeding up these algorithms to achieve sub-quadratic construction times for \ddfa{}s. We apply the framework to speed up several algorithms to near-linear time, and experimentally evaluate their performance on real-world regular expression sets extracted from modern intrusion detection systems. We find an order of magnitude improvement in compression times, with either little or no loss of compression, or even significantly better compression in some cases

    Exploiting the Computational Power of Ternary Content Addressable Memory

    Get PDF
    Ternary Content Addressable Memory or in short TCAM is a special type of memory that can execute a certain set of operations in parallel on all of its words. Because of power consumption and relatively small storage capacity, it has only been used in special environments. Over the past few years its cost has been reduced and its storage capacity has increased signifi cantly and these exponential trends are continuing. Hence it can be used in more general environments for larger problems. In this research we study how to exploit its computational power in order to speed up fundamental problems and needless to say that we barely scratched the surface. The main problems that has been addressed in our research are namely Boolean matrix multiplication, approximate subset queries using bloom filters, Fixed universe priority queues and network flow classi cation. For Boolean matrix multiplication our simple algorithm has a run time of O (d(N^2)/w) where N is the size of the square matrices, w is the number of bits in each word of TCAM and d is the maximum number of ones in a row of one of the matrices. For the Fixed universe priority queue problems we propose two data structures one with constant time complexity and space of O((1/ε)n(U^ε)) and the other one in linear space and amortized time complexity of O((lg lg U)/(lg lg lg U)) which beats the best possible data structure in the RAM model namely Y-fast trees. Considering each word of TCAM as a bloom filter, we modify the hash functions of the bloom filter and propose a data structure which can use the information capacity of each word of TCAM more efi ciently by using the co-occurrence probability of possible members. And finally in the last chapter we propose a novel technique for network flow classi fication using TCAM

    Design and Evaluation of Packet Classification Systems, Doctoral Dissertation, December 2006

    Get PDF
    Although many algorithms and architectures have been proposed, the design of efficient packet classification systems remains a challenging problem. The diversity of filter specifications, the scale of filter sets, and the throughput requirements of high speed networks all contribute to the difficulty. We need to review the algorithms from a high-level point-of-view in order to advance the study. This level of understanding can lead to significant performance improvements. In this dissertation, we evaluate several existing algorithms and present several new algorithms as well. The previous evaluation results for existing algorithms are not convincing because they have not been done in a consistent way. To resolve this issue, an objective evaluation platform needs to be developed. We implement and evaluate several representative algorithms with uniform criteria. The source code and the evaluation results are both published on a web-site to provide the research community a benchmark for impartial and thorough algorithm evaluations. We propose several new algorithms to deal with the different variations of the packet classification problem. They are: (1) the Shape Shifting Trie algorithm for longest prefix matching, used in IP lookups or as a building block for general packet classification algorithms; (2) the Fast Hash Table lookup algorithm used for exact flow match; (3) the longest prefix matching algorithm using hash tables and tries, used in IP lookups or packet classification algorithms;(4) the 2D coarse-grained tuple-space search algorithm with controlled filter expansion, used for two-dimensional packet classification or as a building block for general packet classification algorithms; (5) the Adaptive Binary Cutting algorithm used for general multi-dimensional packet classification. In addition to the algorithmic solutions, we also consider the TCAM hardware solution. In particular, we address the TCAM filter update problem for general packet classification and provide an efficient algorithm. Building upon the previous work, these algorithms significantly improve the performance of packet classification systems and set a solid foundation for further study

    Atomic Transfer for Distributed Systems

    Get PDF
    Building applications and information systems increasingly means dealing with concurrency and faults stemming from distribution of system components. Atomic transactions are a well-known method for transferring the responsibility for handling concurrency and faults from developers to the software\u27s execution environment, but incur considerable execution overhead. This dissertation investigates methods that shift some of the burden of concurrency control into the network layer, to reduce response times and increase throughput. It anticipates future programmable network devices, enabling customized high-performance network protocols. We propose Atomic Transfer (AT), a distributed algorithm to prevent race conditions due to messages crossing on a path of network switches. Switches check request messages for conflicts with response messages traveling in the opposite direction. Conflicting requests are dropped, obviating the request\u27s receiving host from detecting and handling the conflict. AT is designed to perform well under high data contention, as concurrency control effort is balanced across a network instead of being handled by the contended endpoint hosts themselves. We use AT as the basis for a new optimistic transactional cache consistency algorithm, supporting execution of atomic applications caching shared data. We then present a scalable refinement, allowing hierarchical consistent caches with predictable performance despite high data update rates. We give detailed I/O Automata models of our algorithms along with correctness proofs. We begin with a simplified model, assuming static network paths and no message loss, and then refine it to support dynamic network paths and safe handling of message loss. We present a trie-based data structure for accelerating conflict-checking on switches, with benchmarks suggesting the feasibility of our approach from a performance stand-point

    Fast Packet Processing on High Performance Architectures

    Get PDF
    The rapid growth of Internet and the fast emergence of new network applications have brought great challenges and complex issues in deploying high-speed and QoS guaranteed IP network. For this reason packet classication and network intrusion detection have assumed a key role in modern communication networks in order to provide Qos and security. In this thesis we describe a number of the most advanced solutions to these tasks. We introduce NetFPGA and Network Processors as reference platforms both for the design and the implementation of the solutions and algorithms described in this thesis. The rise in links capacity reduces the time available to network devices for packet processing. For this reason, we show different solutions which, either by heuristic and randomization or by smart construction of state machine, allow IP lookup, packet classification and deep packet inspection to be fast in real devices based on high speed platforms such as NetFPGA or Network Processors

    Branch Prediction For Network Processors

    Get PDF
    Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine. Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures

    Network Processors and Next Generation Networks: Design, Applications, and Perspectives

    Get PDF
    Network Processors (NPs) are hardware platforms born as appealing solutions for packet processing devices in networking applications. Nowadays, a plethora of solutions exists, with no agreement on a common architecture. Each vendor has proposed its specific solution and no official standard still exists. The common features of all proposals are a hierarchy of processors, with a general purpose processor and several units specialized for packet processing, a series of memory devices with different sizes and latencies, a low-level programmability. The target is a platform for networking applications with low time to market and high time in market, thanks to a high flexibility and a programmability simpler than that of ASICs, for example. After about ten years since the "birth" of network processors, this research activity wants to make an analytical balance of their development and usage. Many authoritative opinions suggest that NPs have been "outdated" by multicore or manycore systems, which provide general purpose environments and some specialized cores. The main reasons of these negative opinions are the hard programmability of NPs, which often requires the knowledge of private microcode, or the excessive architectural limits, such as reduced memories and minimal instruction store. Our research shows that Network Processors can be appealing for different applications in networking area, and many interesting solutions can be obtained, which present very high performance, outscoring current solutions. However, the issues of hard programming and remarkable limits exist, and they could be alleviated only by providing almost a comprehensive programming environment and a proper design in terms of processing and memory resources. More e cient solutions can be surely provided, but the experience of network processors has produced an important legacy in developing packet processing engines. In this work, we have realized many devices for networking purposes based on NP platform, in order to understand the complexity of programming, the flexibility of design, the complexity of tasks that can be implemented, the maximum depth of packet processing, the performance of such devices, the real usefulness of NPs in network devices. All these features have been accurately analyzed and will be illustrated in this thesis. Many remarkable results have been obtained, which confirm the Network Processors as appealing solutions for network devices. Moreover, the research on NPs have lead us to analyze and solve more general issues, related for instance to multiprocessor systems or to processors with no big available memory. In particular, the latter issue lead us to design many interesting data structures for set representation and membership query, which are based on randomized techniques and allow for big memory savings

    Traffic Re-engineering: Extending Resource Pooling Through the Application of Re-feedback

    Get PDF
    Parallelism pervades the Internet, yet efficiently pooling this increasing path diversity has remained elusive. With no holistic solution for resource pooling, each layer of the Internet architecture attempts to balance traffic according to its own needs, potentially at the expense of others. From the edges, traffic is implicitly pooled over multiple paths by retrieving content from different sources. Within the network, traffic is explicitly balanced across multiple links through the use of traffic engineering. This work explores how the current architecture can be realigned to facilitate resource pooling at both network and transport layers, where tension between stakeholders is strongest. The central theme of this thesis is that traffic engineering can be performed more efficiently, flexibly and robustly through the use of re-feedback. A cross-layer architecture is proposed for sharing the responsibility for resource pooling across both hosts and network. Building on this framework, two novel forms of traffic management are evaluated. Efficient pooling of traffic across paths is achieved through the development of an in-network congestion balancer, which can function in the absence of multipath transport. Network and transport mechanisms are then designed and implemented to facilitate path fail-over, greatly improving resilience without requiring receiver side cooperation. These contributions are framed by a longitudinal measurement study which provides evidence for many of the design choices taken. A methodology for scalably recovering flow metrics from passive traces is developed which in turn is systematically applied to over five years of interdomain traffic data. The resulting findings challenge traditional assumptions on the preponderance of congestion control on resource sharing, with over half of all traffic being constrained by limits other than network capacity. All of the above represent concerted attempts to rethink and reassert traffic engineering in an Internet where competing solutions for resource pooling proliferate. By delegating responsibilities currently overloading the routing architecture towards hosts and re-engineering traffic management around the core strengths of the network, the proposed architectural changes allow the tussle surrounding resource pooling to be drawn out without compromising the scalability and evolvability of the Internet
    corecore