1,113 research outputs found

    GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

    Full text link
    We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

    LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed

    Full text link
    Running off-site software middleboxes at third-party service providers has been a popular practice. However, routing large volumes of raw traffic, which may carry sensitive information, to a remote site for processing raises severe security concerns. Prior solutions often abstract away important factors pertinent to real-world deployment. In particular, they overlook the significance of metadata protection and stateful processing. Unprotected traffic metadata like low-level headers, size and count, can be exploited to learn supposedly encrypted application contents. Meanwhile, tracking the states of 100,000s of flows concurrently is often indispensable in production-level middleboxes deployed at real networks. We present LightBox, the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox is the product of our systematic investigation of how to overcome the inherent limitations of secure enclaves using domain knowledge and customization. First, we introduce an elegant virtual network interface that allows convenient access to fully protected packets at line rate without leaving the enclave, as if from the trusted source network. Second, we provide complete flow state management for efficient stateful processing, by tailoring a set of data structures and algorithms optimized for the highly constrained enclave space. Extensive evaluations demonstrate that LightBox, with all security benefits, can achieve 10Gbps packet I/O, and that with case studies on three stateful middleboxes, it can operate at near-native speed.Comment: Accepted at ACM CCS 201

    Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors

    Get PDF
    This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs

    LRU-PEA: A smart replacement policy for non-uniform cache architectures on chip multiprocessors

    Get PDF
    The increasing speed-gap between processor and memory and the limited memory bandwidth make last-level cache performance crucial for CMP architectures. Non Uniform Cache Architectures (NUCA) has been introduced to deal with this problem. This memory organization divides the whole memory space into smaller pieces or banks allowing nearer banks to have better access latencies than further banks.Moreover, an adaptive replacement policy that efficiently reduces misses in the last-level cache could boost performance, particularly if set associativity is assumed. Unfortunately, traditional replacement policies do not behave properly as they were assumed for single-processors. This paper focuses on Bank Replacement. This policy involves three key decisions when there is a miss: where to place a data within the cache set, which data to evict from the cache set and finally, where to place the evicted data. We propose a novel replacement technique that enables more intelligent replacement decisions to be taken, based on the observation that some type of data are less commonly accessed depending of the bank where they reside. We call this technique as LRU-PEA (Least Recently Used with a Priority Eviction Approach). We show that the proposed technique significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NUCA cache. This translates into an average IPC improvement of 8% and into an Energy per Instruction (EPI) reduction of 5%.Preprin
    • …
    corecore