2,113 research outputs found

    Shared memory with hidden latency on a family of mesh-like networks

    Get PDF

    HPP : a high performance PRAM

    Get PDF
    We present a fast shared memory multiprocessor with uniform memory access time. A first prototype (SB-PRAM) is running with 4 processors, a 128 processor version is under construction. A second implementation (HPP) using latest VLSI technology and optical links shall run at a speed of 96 MHz. To achieve this speed, we first investigate the re-design of ASICs and network links. We then balance processor speed and memory bandwidth by investigating the relation between local computation and global memory access in several benchmark applications. On numerical codes such as linpack, 2 and 8 GFlop/s shall be possible with 128 and 512 processors, respectively, thus approaching processor performance of Intel Paragon XPS. As non-numerical codes we consider circuit simulation and raytracing. We achieve speedups over a one processor SGI challenge of 35 and 81 for 128 processors and 140 and 325 for 512 processors

    Amorphous Placement and Informed Diffusion for Timely Monitoring by Autonomous, Resource-Constrained, Mobile Sensors

    Full text link
    Personal communication devices are increasingly equipped with sensors for passive monitoring of encounters and surroundings. We envision the emergence of services that enable a community of mobile users carrying such resource-limited devices to query such information at remote locations in the field in which they collectively roam. One approach to implement such a service is directed placement and retrieval (DPR), whereby readings/queries about a specific location are routed to a node responsible for that location. In a mobile, potentially sparse setting, where end-to-end paths are unavailable, DPR is not an attractive solution as it would require the use of delay-tolerant (flooding-based store-carry-forward) routing of both readings and queries, which is inappropriate for applications with data freshness constraints, and which is incompatible with stringent device power/memory constraints. Alternatively, we propose the use of amorphous placement and retrieval (APR), in which routing and field monitoring are integrated through the use of a cache management scheme coupled with an informed exchange of cached samples to diffuse sensory data throughout the network, in such a way that a query answer is likely to be found close to the query origin. We argue that knowledge of the distribution of query targets could be used effectively by an informed cache management policy to maximize the utility of collective storage of all devices. Using a simple analytical model, we show that the use of informed cache management is particularly important when the mobility model results in a non-uniform distribution of users over the field. We present results from extensive simulations which show that in sparsely-connected networks, APR is more cost-effective than DPR, that it provides extra resilience to node failure and packet losses, and that its use of informed cache management yields superior performance

    An Efficient Scheme to Provide Real-time Memory Integrity Protection

    Get PDF
    Memory integrity protection has been a longstanding issue in trusted system design. Most viruses and malware attack the system by modifying data that they are not authorized to access. With the development of the Internet, viruses and malware spread much faster than ever before. In this setting, protecting the memory becomes increasingly important. However, it is a hard problem to protect the dynamic memory. The data in the memory changes from time to time so that the schemes have to be fast enough to provide real-time protection while in the same time the schemes have to use slow crytographical functions to keep the security level. In this thesis, we propose a new fast authentication scheme for memory. As in previous proposals the scheme uses a Merkle tree to guarantee dynamic protection of memory. We use the universal hash function family NH for speed and couple it with an AES encryption in order to achieve a high level of security. The proposed scheme is much faster compared to similar schemes achieved by cryptographic hash functions such as SHA-1 due to the finer grain incremental hashing ability provided by NH. With a modified version of the proposed scheme, the system can access the data in memory without checking the integrity all the time and still keeps the same security level. This feature is mainly due to the incremental nature of NH. Moreover, we show that combining with caches and parallelism, we can achieve fast and simple software implementation

    Simulating Uniform Hashing in Constant Time and Optimal Space

    Get PDF
    Many algorithms and data structures employing hashing have been analyzed under the uniform hashing assumption, i.e., the assumption that hash functions behave like truly random functions. In this paper it is shown how to implement hash functions that can be evaluated on a RAM in constant time, and behave like truly random functions on any set of n inputs, with high probability. The space needed to represent a function is O(n) words, which is the best possible (and a polynomial improvement compared to previous fast hash functions). As a consequence, a broad class of hashing schemes can be implemented to meet, with high probability, the performance guarantees of their uniform hashing analysis

    Amorphous Placement and Retrieval of Sensory Data in Sparse Mobile Ad-Hoc Networks

    Full text link
    Abstract—Personal communication devices are increasingly being equipped with sensors that are able to passively collect information from their surroundings – information that could be stored in fairly small local caches. We envision a system in which users of such devices use their collective sensing, storage, and communication resources to query the state of (possibly remote) neighborhoods. The goal of such a system is to achieve the highest query success ratio using the least communication overhead (power). We show that the use of Data Centric Storage (DCS), or directed placement, is a viable approach for achieving this goal, but only when the underlying network is well connected. Alternatively, we propose, amorphous placement, in which sensory samples are cached locally and informed exchanges of cached samples is used to diffuse the sensory data throughout the whole network. In handling queries, the local cache is searched first for potential answers. If unsuccessful, the query is forwarded to one or more direct neighbors for answers. This technique leverages node mobility and caching capabilities to avoid the multi-hop communication overhead of directed placement. Using a simplified mobility model, we provide analytical lower and upper bounds on the ability of amorphous placement to achieve uniform field coverage in one and two dimensions. We show that combining informed shuffling of cached samples upon an encounter between two nodes, with the querying of direct neighbors could lead to significant performance improvements. For instance, under realistic mobility models, our simulation experiments show that amorphous placement achieves 10% to 40% better query answering ratio at a 25% to 35% savings in consumed power over directed placement.National Science Foundation (CNS Cybertrust 0524477, CNS NeTS 0520166, CNS ITR 0205294, EIA RI 0202067

    The ZCache: Decoupling Ways and Associativity

    Full text link
    Abstract—The ever-increasing importance of main memory latency and bandwidth is pushing CMPs towards caches with higher capacity and associativity. Associativity is typically im-proved by increasing the number of ways. This reduces conflict misses, but increases hit latency and energy, placing a stringent trade-off on cache design. We present the zcache, a cache design that allows much higher associativity than the number of physical ways (e.g. a 64-associative cache with 4 ways). The zcache draws on previous research on skew-associative caches and cuckoo hashing. Hits, the common case, require a single lookup, incurring the latency and energy costs of a cache with a very low number of ways. On a miss, additional tag lookups happen off the critical path, yielding an arbitrarily large number of replacement candidates for the incoming block. Unlike conventional designs, the zcache provides associativity by increasing the number of replacement candidates, but not the number of cache ways. To understand the implications of this approach, we develop a general analysis framework that allows to compare associativity across different cache designs (e.g. a set-associative cache and a zcache) by representing associativity as a probability distribution. We use this framework to show that for zcaches, associativity depends only on the number of replacement candidates, and is independent of other factors (such as the number of cache ways or the workload). We also show that, for the same number of replacement candidates, the associativity of a zcache is superior than that of a set-associative cache for most workloads. Finally, we perform detailed simulations of multithreaded and multiprogrammed workloads on a large-scale CMP with zcache as the last-level cache. We show that zcaches provide higher performance and better energy efficiency than conventional caches without incurring the overheads of designs with a large number of ways. I
    corecore