342 research outputs found
Scalable NIDS via Negative Pattern Matching and Exclusive Pattern Matching
In this paper, we identify the unique challenges in deploying parallelism on TCAM-based pattern matching for Network Intrusion Detection Systems (NIDSes). We resolve two critical issues when designing scalable parallelism specifically for pattern matching modules: 1) how to enable fine-grained parallelism in pursuit of effective load balancing and desirable speedup simultaneously; and 2) how to reconcile the tension between parallel processing speedup and prohibitive TCAM power consumption. To this end, we first propose the novel concept of Negative Pattern Matching to partition flows, by which the number of TCAM lookups can be significantly reduced, and the resulting (fine-grained) flow segments can be inspected in parallel without incurring false negatives. Then we propose the notion of Exclusive Pattern Matching to divide the entire pattern set into multiple subsets which can later be matched against selectively and independently without affecting the correctness. We show that Exclusive Pattern Matching enables the adoption of smaller and faster TCAM blocks and improves both the pattern matching speed and scalability. Finally, our theoretical and experimental results validate that the above two concepts are inherently complementary, enabling our integrated scheme to provide performance gain in any scenario (with either clean or dirty traffic).Department of ComputingRefereed conference pape
Ternary content addressable memory for longest prefix matching based on random access memory on field programmable gate array
Conventional ternary content addressable memory (TCAM) provides access to stored data, which consists of '0', '1' and ‘don't care’, and outputs the matched address. Content lookup in TCAM can be done in a single cycle, which makes it very important in applications such as address lookup and deep-packet inspection. This paper proposes an improved TCAM architecture with fast update functionality. To support longest prefix matching (LPM), LPM logic are needed to the proposed TCAM. The latency of the proposed LPM logic is dependent on the number of matching addresses in address prefix comparison. In order to improve the throughput, parallel LPM logic is added to improve the throughput by 10× compared to the one without. Although with resource overhead, the cost of throughput per bit is less as compared to the one without parallel LPM logic
High-Performance Packet Processing Engines Using Set-Associative Memory Architectures
The emergence of new optical transmission technologies has led to ultra-high Giga bits per second (Gbps) link speeds.
In addition, the switch from 32-bit long IPv4 addresses to the 128-bit long IPv6 addresses is currently progressing.
Both factors make it hard for new Internet routers and firewalls to keep up with wire-speed packet-processing.
By packet-processing we mean three applications: packet forwarding, packet classification and deep packet inspection.
In packet forwarding (PF), the router has to match the incoming packet's IP address against the forwarding table.
It then directs each packet to its next hop toward its final destination.
A packet classification (PC) engine examines a packet header by matching it against a database of rules, or filters, to obtain the best matching rule.
Rules are associated with either an ``action'' (e.g., firewall) or a ``flow ID'' (e.g., quality of service or QoS).
The last application is deep packet inspection (DPI) where the firewall has to inspect the actual packet payload for malware or network attacks.
In this case, the payload is scanned against a database of rules, where each rule is either a plain text string or a regular expression.
In this thesis, we introduce a family of hardware solutions that combine the above requirements.
These solutions rely on a set-associative memory architecture that is called CA-RAM (Content Addressable-Random Access Memory).
CA-RAM is a hardware implementation of hash tables with the property that each bucket of a hash table can be searched in one memory cycle.
However, the classic hashing downsides have to be dealt with, such as collisions that lead to overflow and worst-case memory access time.
The two standard solutions to the overflow problem are either to use some predefined probing (e.g., linear or quadratic) or to use multiple hash functions.
We present new hash schemes that extend both aforementioned solutions to tackle the overflow problem efficiently.
We show by experimenting with real IP lookup tables, synthetic packet classification rule sets and real DPI databases that our schemes outperform other previously proposed schemes
TCA<i>m</i>M<sup>CogniGron</sup>::Energy Efficient Memristor-Based TCAM for Match-Action Processing
The Internet relies heavily on programmable match-action processors for matching network packets against locally available network rules and taking actions, such as forwarding and modification of network packets. This match-action process must be performed at high speed, i.e., commonly within one clock cycle, using a specialized memory unit called Ternary Content Addressable Memory (TCAM). Building on transistor-based CMOS designs, state-of-the-art TCAM architectures have high energy consumption and lack resilient designs for incorporating novel technologies for performing appropriate actions. In this article, we motivate the use of a novel fundamental component, the ‘Memristor’, for the development of TCAM architecture for match-action processing. Memristors can provide energy efficiency, non-volatility and better resource density as compared to transistors. We have proposed a novel memristor-based TCAM architecture called TCAmMCogniGron, built upon the voltage divider principle and requiring only two memristors and five transistors for storage and search operations compared to sixteen transistors in the traditional TCAM architecture. We analyzed its performance over an experimental data set of Nb-doped SrTiO3-based memristor. The analysis of TCAmMCogniGron showed promising power consumption statistics of 16 uW and 1 uW for match and mismatch operations along with twice the improvement in resources density as compared to the traditional architectures
- …