Search CORE

23 research outputs found

Birth of a Transformer: A Memory Viewpoint

Author: Bietti Alberto
Bottou Leon
Bouchacourt Diane
Cabannes Vivien
Jegou Herve
Publication venue
Publication date: 01/06/2023
Field of study

Large language models based on transformers have achieved great empirical successes. However, as they are deployed more widely, there is a growing need to better understand their internal mechanisms in order to make them more reliable. These models appear to store vast amounts of knowledge from their training data, and to adapt quickly to new information provided in their context or prompt. We study how transformers balance these two types of knowledge by considering a synthetic setup where tokens are generated from either global or context-specific bigram distributions. By a careful empirical analysis of the training process on a simplified two-layer transformer, we illustrate the fast learning of global bigrams and the slower development of an "induction head" mechanism for the in-context bigrams. We highlight the role of weight matrices as associative memories, provide theoretical insights on how gradients enable their learning during training, and study the role of data-distributional properties

arXiv.org e-Print Archive

A New Class of Codes for Robust Compression of Heterogeneous Data: Multiplexed Codes

Author: Christine Guillemot
Herve Jegou
Publication venue
Publication date: 01/01/2003
Field of study

Compression systems of real signals (images, video, audio) generate sources of information with different levels of priority which are then encoded with variable length codes (VLC). This paper addresses the issue of robust transmission of such VLC encoded heterogeneous sources over error-prone channels. VLCs are very sensitive to channel noise: when some bits are altered, synchronization losses can occur at the receiver. This paper describes a new family of codes, called multiplexed codes, that allow to confine the desynchronization phenomenon to low priority data while allowing to reach asymptotically the entropy bound for both (low and high priority) sources. The idea consists in creating Fixed length codes for high priority information and in using the inherent redundancy to describe low priority data, hence the name multiplexed codes. Theoritical and simulation results reveal a very high error resilience at almost no cost in compression efficiency

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Abstract

Author: Hedi Harzallah
Herve Jegou
Publication venue
Publication date
Field of study

In this paper we present two contributions to improve accuracy and speed of an image search system based on bag-of-features: a contextual dissimilarity measure (CDM) and an efficient search structure for visual word vectors. Our measure (CDM) takes into account the local distribution of the vectors and iteratively estimates distance correcting terms. These terms are subsequently used to update an existing distance, thereby modifying the neighborhood structure. Experimental results on the Nistér-Stewénius dataset show that our approach significantly outperforms the state-of-the-art in terms of accuracy. Our efficient search structure for visual word vectors is a two-level scheme using inverted files. The first level partitions the image set into clusters of images. At query time, only a subset of clusters of the second level has to be searched. This method allows fast querying in large sets of images. We evaluate the gain in speed and the loss in accuracy on large datasets (up to 1 million images). 1

CiteSeerX

Interferences in Match Kernels

Author: Andrew Zisserman
Florent Perronnin
Herve Jegou
Naila Murray
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

INRIA-LEARs video copy detection system

Author: Adrien Gaidon
Cordelia Schmid
Herve Jegou
Marcin Marszałek
Matthijs Douze
Publication venue
Publication date: 01/01/2008
Field of study

1 Copyright detection task 2 High-level feature extraction tas

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server