Search CORE

15 research outputs found

Co-training $2^L$ Submodels for Visual Recognition

Author: Bojanowski Piotr
Cord Matthieu
Jégou Hervé
Oquab Maxime
Touvron Hugo
Verbeek Jakob
Publication venue
Publication date: 09/12/2022
Field of study

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val

arXiv.org e-Print Archive

Code Llama: Open Foundation Models for Code

Author: Adi Yossi
Azhar Faisal
Bhatt Manish
Bitton Joanna
Copet Jade
Défossez Alexandre
Evtimov Ivan
Ferrer Cristian Canton
Gat Itai
Gehring Jonas
Gloeckle Fabian
Grattafiori Aaron
Kozhevnikov Artyom
Liu Jingyu
Martin Louis
Rapin Jérémy
Remez Tal
Rozière Baptiste
Scialom Thomas
Sootla Sten
Synnaeve Gabriel
Tan Xiaoqing Ellen
Touvron Hugo
Usunier Nicolas
Xiong Wenhan
Publication venue
Publication date: 25/08/2023
Field of study

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks. We provide multiple flavors to cover a wide range of applications: foundation models (Code Llama), Python specializations (Code Llama - Python), and instruction-following models (Code Llama - Instruct) with 7B, 13B and 34B parameters each. All models are trained on sequences of 16k tokens and show improvements on inputs with up to 100k tokens. 7B and 13B Code Llama and Code Llama - Instruct variants support infilling based on surrounding content. Code Llama reaches state-of-the-art performance among open models on several code benchmarks, with scores of up to 53% and 55% on HumanEval and MBPP, respectively. Notably, Code Llama - Python 7B outperforms Llama 2 70B on HumanEval and MBPP, and all our models outperform every other publicly available model on MultiPL-E. We release Code Llama under a permissive license that allows for both research and commercial use

arXiv.org e-Print Archive

Are Large-scale Datasets Necessary for Self-Supervised Pre-training?

Author: El-Nouby Alaaeldin
Grave Edouard
Izacard Gautier
Jégou Hervé
Laptev Ivan
Touvron Hugo
Publication venue: HAL CCSD
Publication date: 20/12/2021
Field of study

Pre-training models on large scale datasets, like Ima-geNet, is a standard practice in computer vision. This paradigm is especially effective for tasks with small training sets, for which high-capacity models tend to overfit. In this work, we consider a self-supervised pre-training scenario that only leverages the target task data. We consider datasets, like Stanford Cars, Sketch or COCO, which are order(s) of magnitude smaller than Imagenet. Our study shows that denoising autoencoders, such as BEiT or a variant that we introduce in this paper, are more robust to the type and size of the pre-training data than popular self-supervised methods trained by comparing image embeddings. We obtain competitive performance compared to ImageNet pre-training on a variety of classification datasets, from different domains. On COCO, when pretraining solely using COCO images, the detection and instance segmentation performance surpasses the supervised ImageNet pre-training in a comparable setting

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server