119,955 research outputs found

    MEMORY-VQ: Compression for Tractable Internet-Scale Memory

    Full text link
    Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up inference. However, memory also leads to much greater storage requirements from storing pre-computed representations. We propose MEMORY-VQ, a new method to reduce storage requirements of memory-augmented models without sacrificing performance. Our method uses a vector quantization variational autoencoder (VQ-VAE) to compress token representations. We apply MEMORY-VQ to the LUMEN model to obtain LUMEN-VQ, a memory model that achieves a 16x compression rate with comparable performance on the KILT benchmark. LUMEN-VQ enables practical retrieval augmentation even for extremely large retrieval corpora

    Efficient On-the-fly Category Retrieval using ConvNets and GPUs

    Full text link
    We investigate the gains in precision and speed, that can be obtained by using Convolutional Networks (ConvNets) for on-the-fly retrieval - where classifiers are learnt at run time for a textual query from downloaded images, and used to rank large image or video datasets. We make three contributions: (i) we present an evaluation of state-of-the-art image representations for object category retrieval over standard benchmark datasets containing 1M+ images; (ii) we show that ConvNets can be used to obtain features which are incredibly performant, and yet much lower dimensional than previous state-of-the-art image representations, and that their dimensionality can be reduced further without loss in performance by compression using product quantization or binarization. Consequently, features with the state-of-the-art performance on large-scale datasets of millions of images can fit in the memory of even a commodity GPU card; (iii) we show that an SVM classifier can be learnt within a ConvNet framework on a GPU in parallel with downloading the new training images, allowing for a continuous refinement of the model as more images become available, and simultaneous training and ranking. The outcome is an on-the-fly system that significantly outperforms its predecessors in terms of: precision of retrieval, memory requirements, and speed, facilitating accurate on-the-fly learning and ranking in under a second on a single GPU.Comment: Published in proceedings of ACCV 201

    A Morphological Associative Memory Employing A Stored Pattern Independent Kernel Image and Its Hardware Model

    Get PDF
    An associative memory provides a convenient way for pattern retrieval and restoration, which has an important role for handling data distorted with noise. As an effective associative memory, we paid attention to a morphological associative memory (MAM) proposed by Ritter. The model is superior to ordinary associative memory models in terms of calculation amount, memory capacity, and perfect recall rate. However, in general, the kernel design becomes difficult as the stored pattern increases because the kernel uses a part of each stored pattern. In this paper, we propose a stored pattern independent kernel design method for the MAM and design the MAM employing the proposed kernel design with a standard digital manner in parallel architecture for acceleration. We confirm the validity of the proposed kernel design method by auto- and hetero-association experiments and investigate the efficiency of the hardware acceleration. A high-speed operation (more than 150 times in comparison with software execution) is achieved in the custom hardware. The proposed model works as an intelligent pre-processor for the Brain-Inspired Systems (Brain-IS) working in real world

    Individual Differences in Processing Speed and Working Memory Speed as Assessed with the Sternberg Memory Scanning Task

    Get PDF
    The Sternberg Memory Scanning (SMS) task provides a measure of processing speed (PS) and working memory retrieval speed (WMS). In this task, participants are presented with sets of stimuli that vary in size. After a delay, one item is presented, and participants indicate whether or not the item was part of the set. Performance is assessed by speed and accuracy for both the positive (item is part of the set) and the negative trials (items is not part of the set). To examine the causes of variation in PS and WMS, 623 adult twins and their siblings completed the SMS task. A non-linear growth curve (nLGC) model best described the increase in reaction time with increasing set size. Genetic analyses showed that WMS (modeled as the Slope in the nLGC model) has a relatively small variance which is not due to genetic variation while PS (modeled as the Intercept in the nLGC model) showed large individual differences, part of which could be attributed to additive genetic factors. Heritability was 38% for positive and 32% for negative trials. Additional multivariate analyses showed that the genetic effects on PS for positive and negative trials were completely shared. We conclude that genetic influences on working memory performance are more likely to act upon basic processing speed and (pre)motoric processes than on the speed with which an item is retrieved from short term memory

    FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference

    Full text link
    Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, the architecture used for FiD was chosen by making minimal modifications to a standard T5 model, which our analysis shows to be highly suboptimal for a retrieval-augmented model. In particular, FiD allocates the bulk of FLOPs to the encoder, while the majority of inference time results from memory bandwidth constraints in the decoder. We propose two simple changes to the FiD architecture to alleviate memory bandwidth constraints, and speed up inference by 7x. This allows us to use a much larger decoder at modest cost. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.Comment: ACL Findings 202

    A Dynamic Approach to Recognition Memory

    Get PDF
    Thesis (Ph.D.) - Indiana University,Psychological and Brain Sciences/Cognitive Science, 2015We argue that taking a dynamic approach to the understanding of memory will lead to advances that are not possible via other routes. To that end, we present a model of recognition memory that specifies how memory retrieval and recognition decisions jointly evolve over time and show that it is able to jointly predict accuracy, response time, and speed-accuracy trade-off functions. The model affords insights into the effects of study time, list length, and instructions. The model leads to a novel qualitative and quantitative test of the source of word frequency effects in recognition, showing that the relatively high distinctiveness of the features of low frequency words provide the best account. We also show how the dynamic model can be extended to account for paradigms like associative recognition and list discrimination, leading to another novel test of the presence of recall-like processes. Associative recognition, list discrimination, recognition of similar foils, and source exclusion are all better explained by the formation of a compound cue rather than recall, although source memory is found to be better modeled by a recall process

    The structure-sensitivity of memory access: evidence from Mandarin Chinese

    Get PDF
    The present study examined the processing of the Mandarin Chinese long-distance reflexive ziji to evaluate the role that syntactic structure plays in the memory retrieval operations that support sentence comprehension. Using the multiple-response speed-accuracy tradeoff (MR-SAT) paradigm, we measured the speed with which comprehenders retrieve an antecedent for ziji. Our experimental materials contrasted sentences where ziji's antecedent was in the local clause with sentences where ziji's antecedent was in a distant clause. Time course results from MR-SAT suggest that ziji dependencies with syntactically distant antecedents are slower to process than syntactically local dependencies. To aid in interpreting the SAT data, we present a formal model of the antecedent retrieval process, and derive quantitative predictions about the time course of antecedent retrieval. The modeling results support the Local Search hypothesis: during syntactic retrieval, comprehenders initially limit memory search to the local syntactic domain. We argue that Local Search hypothesis has important implications for theories of locality effects in sentence comprehension. In particular, our results suggest that not all locality effects may be reduced to the effects of temporal decay and retrieval interference

    Caching in Multidimensional Databases

    Get PDF
    One utilisation of multidimensional databases is the field of On-line Analytical Processing (OLAP). The applications in this area are designed to make the analysis of shared multidimensional information fast [9]. On one hand, speed can be achieved by specially devised data structures and algorithms. On the other hand, the analytical process is cyclic. In other words, the user of the OLAP application runs his or her queries one after the other. The output of the last query may be there (at least partly) in one of the previous results. Therefore caching also plays an important role in the operation of these systems. However, caching itself may not be enough to ensure acceptable performance. Size does matter: The more memory is available, the more we gain by loading and keeping information in there. Oftentimes, the cache size is fixed. This limits the performance of the multidimensional database, as well, unless we compress the data in order to move a greater proportion of them into the memory. Caching combined with proper compression methods promise further performance improvements. In this paper, we investigate how caching influences the speed of OLAP systems. Different physical representations (multidimensional and table) are evaluated. For the thorough comparison, models are proposed. We draw conclusions based on these models, and the conclusions are verified with empirical data.Comment: 14 pages, 5 figures, 8 tables. Paper presented at the Fifth Conference of PhD Students in Computer Science, Szeged, Hungary, 27 - 30 June 2006. For further details, please refer to http://www.inf.u-szeged.hu/~szepkuti/papers.html#cachin

    Exploring cognitive issues in visual information retrieval

    Get PDF
    A study was conducted that compared user performance across a range of search tasks supported by both a textual and a visual information retrieval interface (VIRI). Test scores representing seven distinct cognitive abilities were examined in relation to user performance. Results indicate that, when using VIRIs, visual-perceptual abilities account for significant amounts of within-subjects variance, particularly when the relevance criteria were highly specific. Visualisation ability also seemed to be a critical factor when users were required to change topical perspective within the visualisation. Suggestions are made for navigational cues that may help to reduce the effects of these individual differences
    corecore