16,562 research outputs found

    Near-Memory Address Translation

    Full text link
    Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

    Causal Induction from Continuous Event Streams: Evidence for Delay-Induced Attribution Shifts

    Get PDF
    Contemporary theories of Human Causal Induction assume that causal knowledge is inferred from observable contingencies. While this assumption is well supported by empirical results, it fails to consider an important problem-solving aspect of causal induction in real time: In the absence of well structured learning trials, it is not clear whether the effect of interest occurred because of the cause under investigation, or on its own accord. Attributing the effect to either the cause of interest or alternative background causes is an important precursor to induction. We present a new paradigm based on the presentation of continuous event streams, and use it to test the Attribution-Shift Hypothesis (Shanks & Dickinson, 1987), according to which temporal delays sever the attributional link between cause and effect. Delays generally impaired attribution to the candidate, and increased attribution to the constant background of alternative causes. In line with earlier research (Buehner & May, 2002, 2003, 2004) prior knowledge and experience mediated this effect. Pre-exposure to a causally ineffective background context was found to facilitate the discovery of delayed causal relationships by reducing the tendency for attributional shifts to occur. However, longer exposure to a delayed causal relationship did not improve discovery. This complex pattern of results is problematic for associative learning theories, but supports the Attribution-Shift Hypothesi

    Extensible sparse functional arrays with circuit parallelism

    Get PDF
    A longstanding open question in algorithms and data structures is the time and space complexity of pure functional arrays. Imperative arrays provide update and lookup operations that require constant time in the RAM theoretical model, but it is conjectured that there does not exist a RAM algorithm that achieves the same complexity for functional arrays, unless restrictions are placed on the operations. The main result of this paper is an algorithm that does achieve optimal unit time and space complexity for update and lookup on functional arrays. This algorithm does not run on a RAM, but instead it exploits the massive parallelism inherent in digital circuits. The algorithm also provides unit time operations that support storage management, as well as sparse and extensible arrays. The main idea behind the algorithm is to replace a RAM memory by a tree circuit that is more powerful than the RAM yet has the same asymptotic complexity in time (gate delays) and size (number of components). The algorithm uses an array representation that allows elements to be shared between many arrays with only a small constant factor penalty in space and time. This system exemplifies circuit parallelism, which exploits very large numbers of transistors per chip in order to speed up key algorithms. Extensible Sparse Functional Arrays (ESFA) can be used with both functional and imperative programming languages. The system comprises a set of algorithms and a circuit specification, and it has been implemented on a GPGPU with good performance

    AT-GIS: highly parallel spatial query processing with associative transducers

    Get PDF
    Users in many domains, including urban planning, transportation, and environmental science want to execute analytical queries over continuously updated spatial datasets. Current solutions for largescale spatial query processing either rely on extensions to RDBMS, which entails expensive loading and indexing phases when the data changes, or distributed map/reduce frameworks, running on resource-hungry compute clusters. Both solutions struggle with the sequential bottleneck of parsing complex, hierarchical spatial data formats, which frequently dominates query execution time. Our goal is to fully exploit the parallelism offered by modern multicore CPUs for parsing and query execution, thus providing the performance of a cluster with the resources of a single machine. We describe AT-GIS, a highly-parallel spatial query processing system that scales linearly to a large number of CPU cores. ATGIS integrates the parsing and querying of spatial data using a new computational abstraction called associative transducers(ATs). ATs can form a single data-parallel pipeline for computation without requiring the spatial input data to be split into logically independent blocks. Using ATs, AT-GIS can execute, in parallel, spatial query operators on the raw input data in multiple formats, without any pre-processing. On a single 64-core machine, AT-GIS provides 3× the performance of an 8-node Hadoop cluster with 192 cores for containment queries, and 10× for aggregation queries

    Augmenting human memory using personal lifelogs

    Get PDF
    Memory is a key human facility to support life activities, including social interactions, life management and problem solving. Unfortunately, our memory is not perfect. Normal individuals will have occasional memory problems which can be frustrating, while those with memory impairments can often experience a greatly reduced quality of life. Augmenting memory has the potential to make normal individuals more effective, and those with significant memory problems to have a higher general quality of life. Current technologies are now making it possible to automatically capture and store daily life experiences over an extended period, potentially even over a lifetime. This type of data collection, often referred to as a personal life log (PLL), can include data such as continuously captured pictures or videos from a first person perspective, scanned copies of archival material such as books, electronic documents read or created, and emails and SMS messages sent and received, along with context data of time of capture and access and location via GPS sensors. PLLs offer the potential for memory augmentation. Existing work on PLLs has focused on the technologies of data capture and retrieval, but little work has been done to explore how these captured data and retrieval techniques can be applied to actual use by normal people in supporting their memory. In this paper, we explore the needs for augmenting human memory from normal people based on the psychology literature on mechanisms about memory problems, and discuss the possible functions that PLLs can provide to support these memory augmentation needs. Based on this, we also suggest guidelines for data for capture, retrieval needs and computer-based interface design. Finally we introduce our work-in-process prototype PLL search system in the iCLIPS project to give an example of augmenting human memory with PLLs and computer based interfaces
    corecore