23 research outputs found

    Birth of a Transformer: A Memory Viewpoint

    Full text link
    Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties

    A New Class of Codes for Robust Compression of Heterogeneous Data: Multiplexed Codes

    Get PDF
    Compression systems of real signals (images, video, audio) generate sources of information with different levels of priority which are then encoded with variable length codes (VLC). This paper addresses the issue of robust transmission of such VLC encoded heterogeneous sources over error-prone channels. VLCs are very sensitive to channel noise: when some bits are altered, synchronization losses can occur at the receiver. This paper describes a new family of codes, called multiplexed codes, that allow to confine the desynchronization phenomenon to low priority data while allowing to reach asymptotically the entropy bound for both (low and high priority) sources. The idea consists in creating Fixed length codes for high priority information and in using the inherent redundancy to describe low priority data, hence the name multiplexed codes. Theoritical and simulation results reveal a very high error resilience at almost no cost in compression efficiency

    Abstract

    No full text
    In this paper we present two contributions to improve accuracy and speed of an image search system based on bag-of-features: a contextual dissimilarity measure (CDM) and an efficient search structure for visual word vectors. Our measure (CDM) takes into account the local distribution of the vectors and iteratively estimates distance correcting terms. These terms are subsequently used to update an existing distance, thereby modifying the neighborhood structure. Experimental results on the Nistér-Stewénius dataset show that our approach significantly outperforms the state-of-the-art in terms of accuracy. Our efficient search structure for visual word vectors is a two-level scheme using inverted files. The first level partitions the image set into clusters of images. At query time, only a subset of clusters of the second level has to be searched. This method allows fast querying in large sets of images. We evaluate the gain in speed and the loss in accuracy on large datasets (up to 1 million images). 1

    Interferences in Match Kernels

    No full text

    INRIA-LEARs video copy detection system

    Get PDF
    1 Copyright detection task 2 High-level feature extraction tas
    corecore