6 research outputs found

    OpenFlow compatible key-based routing protocol: adapting SDN networks to content/service-centric paradigm

    Get PDF
    The host-to-host/content/service communication instead of the host-to-host communication offered by traditional Internet Protocol (IP) routing solutions has been demanded in the last few years. Nowadays, getting this type of communication directly at network level is an increasing demand in the framework of new networking scenarios, such as Internet of Things and data center scenarios. Inspired by Key-Based Routing (KBR) solutions which, in conjunction with Distributed Hash Tables, have offered a way of providing content-sharing solutions in overlay networks on the top of the Internet for years now, we propose OFC-KBR (OpenFlow Compatible Key-Based Routing) solution. OFC-KBR is a key-based routing solution directly implemented at network layer that makes use of the potential of Software Defined Networking. In this solution, end-points are identified by virtual identifiers. These virtual identifiers are obtained from a descriptive textual name, whose format is not fixed and can be defined depending on the requirements of the service that is going to use the proposed OFC-KBR solution. OFC-KBR is totally compatible with the current OpenFlow standard and can co-exist with other L2/L3 protocols. The proposal has been implemented and evaluated by simulation considering real topologies.This research has been supported by the AEI/FEDER, UE Project Grant TEC2016-76465-C2-1-R (AIM). Adrian Flores de la Cruz also thanks the Spanish Ministry of Economy, Industry and Competitiveness for the FPI (BES-2014-069097) pre-doctoral fellowship

    A scalable architecture for ordered parallelism

    Get PDF
    We present Swarm, a novel architecture that exploits ordered irregular parallelism, which is abundant but hard to mine with current software and hardware techniques. In this architecture, programs consist of short tasks with programmer-specified timestamps. Swarm executes tasks speculatively and out of order, and efficiently speculates thousands of tasks ahead of the earliest active task to uncover ordered parallelism. Swarm builds on prior TLS and HTM schemes, and contributes several new techniques that allow it to scale to large core counts and speculation windows, including a new execution model, speculation-aware hardware task management, selective aborts, and scalable ordered commits. We evaluate Swarm on graph analytics, simulation, and database benchmarks. At 64 cores, Swarm achieves 51--122× speedups over a single-core system, and out-performs software-only parallel algorithms by 3--18×.National Science Foundation (U.S.) (Award CAREER-145299

    Exploiting the Computational Power of Ternary Content Addressable Memory

    Get PDF
    Ternary Content Addressable Memory or in short TCAM is a special type of memory that can execute a certain set of operations in parallel on all of its words. Because of power consumption and relatively small storage capacity, it has only been used in special environments. Over the past few years its cost has been reduced and its storage capacity has increased signifi cantly and these exponential trends are continuing. Hence it can be used in more general environments for larger problems. In this research we study how to exploit its computational power in order to speed up fundamental problems and needless to say that we barely scratched the surface. The main problems that has been addressed in our research are namely Boolean matrix multiplication, approximate subset queries using bloom filters, Fixed universe priority queues and network flow classi cation. For Boolean matrix multiplication our simple algorithm has a run time of O (d(N^2)/w) where N is the size of the square matrices, w is the number of bits in each word of TCAM and d is the maximum number of ones in a row of one of the matrices. For the Fixed universe priority queue problems we propose two data structures one with constant time complexity and space of O((1/ε)n(U^ε)) and the other one in linear space and amortized time complexity of O((lg lg U)/(lg lg lg U)) which beats the best possible data structure in the RAM model namely Y-fast trees. Considering each word of TCAM as a bloom filter, we modify the hash functions of the bloom filter and propose a data structure which can use the information capacity of each word of TCAM more efi ciently by using the co-occurrence probability of possible members. And finally in the last chapter we propose a novel technique for network flow classi fication using TCAM

    Representation of Classification Functions by Head-Tail Expressions

    Get PDF
    九州工業大学博士学位論文 学位記番号:情工博甲第291号 学位授与年月日:平成26年3月25日1 Introduction||2 Preliminary||3 GeneratingPrefixSum-of-ProductsExpressionsforIntervalFunctions||4 Derivation ofHead-TailExpressions for Interval Functions||5 Head-Tail Expressions for Single-Field Classification Functions||6 Head-TailExpressions forMulti-FieldClassificationFunctions||7 Conclusion and Future Work||Acknowledgements||List of PublicationsPacket classification is used in various network applications such as firewalls, access control lists, and network address translators. This technology uses ternary content addressable memories (TCAMs) to perform high speed packet forwarding. However, TCAMs dissipate high power and their cost are high. Thus, reduction of TCAMs is crucial. First, this thesis derives the prefix sum-of-products expression (PreSOP) and the number of products in a PreSOP for an interval function. Second, it derives Ψ(n,τ p), the number of n-variable interval functions that can be represented with τp products. Finally, it shows that more than 99.9% of the n-variable interval functions can be represented with ?32 n ? 1? products when n is sufficiently large. These results are useful for fast PreSOP generator and for estimating the size of Ternary Content Addressable Memories (TCAMs) for packet classification. Second, this thesis shows a method to represent interval functions by using head-tail expressions. The head-tail expressions represent greater-than GT(n : A) functions, lessthan LT(n : B) functions, and interval functions IN0(n : A,B) more efficiently than sum-of-products expressions, where n denotes the number of bits to represent the largest value in the interval (A,B). This paper proves that a head-tail expression (HT) represents an interval function with at most n words in a ternary content addressable memory (TCAM) realization. It also shows the average numbers of factors to represent interval functions by HTs for up to n = 16, which were obtained by a computer simulation. It also conjectures that, for sufficiently large n, the average number of factors to represent n-variable interval functions by HTs is at most 23 n ? 59. Experimental results also show that, for n ? 10, to represent interval functions, HTs require at least 20% fewer factors than MSOPs, on the average. Third, this thesis presents a method to generate head-tail expressions for single-field classification functions. First, it introduces a fast prefix sum-of-product (PreSOP) generator (FP) which generates products using the bit patterns of the endpoints. Next, it shows a direct head-tail expression generator (DHT). Experimental results show that DHT generates much smaller TCAM than FP. The proposed algorithm is useful for simplified TCAM generator for packet classification. Finally, this thesis shows methods to simplify rules in TCAMs for packet classification. First method, it partitions the rules into groups so that each group has the same source address, destination address and protocol. After that, it implifies rules in each group by removing redundant rules. A computer program was developed to simplify rules among groups. Experimental results show that this method reduces the size of rules up to 57% of the original specification for ACL5 rules, 73% for ACL3 rules, and 87% for overall rules. This algorithm is useful to reduce TCAMs for packet classification. In the second method, we reduce the number of words in TCAM for multi-field classification functions by using head-tail expressions. It presents MFHT, an O(r2)-algorithm to generate simplified TCAMs for two-field classification functions, where r is the number of rules. Experimental results show that MFHT achieves a 58% reduction of words for random rules and a 52% reduction of words for ACL and FW rules. Moreover, MFHT is fast. The methods are useful for simplifying TCAM for packet classification

    Low-Power High-Performance Ternary Content Addressable Memory Circuits

    Get PDF
    Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs

    MMP

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 129-135).Reliability and security are quickly becoming users' biggest concern due to the increasing reliance on computers in all areas of society. Hardware-enforced, fine-grained memory protection can increase the reliability and security of computer systems, but will be adopted only if the protection mechanism does not compromise performance, and if the hardware mechanism can be used easily by existing software. Mondriaan memory protection (MMP) provides fine-grained memory protection for a linear address space, while supporting an efficient hardware implementation. MMP's use of linear addressing makes it compatible with current software programming models and program binaries, and it is also backwards compatible with current operating systems and instruction sets. MMP can be implemented efficiently because it separates protection information from program data, allowing protection information to be compressed and cached efficiently. This organization is similar to paging hardware, where the translation information for a page of data bytes is compressed to a single translation value and cached in the TLB. MMP stores protection information in tables in protected system memory, just as paging hardware stores translation information in page tables. MMP is well suited to improve the robustness of modern software. Modern software development favors modules (or plugins) as a way to structure and provide extensibility for large systems, like operating systems, web servers and web clients. Protection between modules written in unsafe languages is currently provided only by programmer convention, reducing system stability.(cont.) Device drivers, which are implemented as loadable modules, are now the most frequent source of operating system crashes (e.g., 85% of Windows XP crashes in one study [SBL03]). MMP provides a mechanism to enforce module boundaries, increasing system robustness by isolating modules from each other and making all memory sharing explicit. We implement the MMP hardware in a simulator and modify a version of the Linux 2.4.19 operating system to use it. Linux loads its device drivers as kernel module extensions, and MMP enforces the module boundaries, only allowing the device drivers access to the memory they need to function. The memory isolation provided by MMP increases Linux's resistance to programmer error, and exposed two kernel bugs in common, heavily-tested drivers. Experiments with several benchmarks where MMP was used extensively indicate the space taken by the MMP data structures is less than 11% of the memory used by the kernel, and the kernel's runtime, according to a simple performance model, increases less than 12% (relative to an unmodified kernel).by Emmett Jethro Witchel.Ph.D
    corecore