Search CORE

19 research outputs found

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

Author: Ruder Sebastian
Sanh Victor
Wolf Thomas
Publication venue
Publication date: 26/11/2018
Field of study

Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the bottom to the top layers of the model, the hidden states of the layers tend to represent more complex semantic information.Comment: 8 pages, 1 figure, To appear in Proceedings of AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Author: Laurençon Hugo
Sanh Victor
Tronchon Léo
Publication venue
Publication date: 13/03/2024
Field of study

Using vision-language models (VLMs) in web development presents a promising strategy to increase efficiency and unblock no-code solutions: by providing a screenshot or a sketch of a UI, a VLM could generate the code to reproduce it, for instance in a language like HTML. Despite the advancements in VLMs for various tasks, the specific challenge of converting a screenshot into a corresponding HTML has been minimally explored. We posit that this is mainly due to the absence of a suitable, high-quality dataset. This work introduces WebSight, a synthetic dataset consisting of 2 million pairs of HTML codes and their corresponding screenshots. We fine-tune a foundational VLM on our dataset and show proficiency in converting webpage screenshots to functional HTML code. To accelerate the research in this area, we open-source WebSight

arXiv.org e-Print Archive

What matters when building vision-language models?

Author: Cord Matthieu
Laurençon Hugo
Sanh Victor
Tronchon Léo
Publication venue
Publication date: 03/05/2024
Field of study

The growing interest in vision-language models (VLMs) has been driven by improvements in large language models and vision transformers. Despite the abundance of literature on this subject, we observe that critical decisions regarding the design of VLMs are often not justified. We argue that these unsupported decisions impede progress in the field by making it difficult to identify which choices improve model performance. To address this issue, we conduct extensive experiments around pre-trained models, architecture choice, data, and training methods. Our consolidation of findings includes the development of Idefics2, an efficient foundational VLM of 8 billion parameters. Idefics2 achieves state-of-the-art performance within its size category across various multimodal benchmarks, and is often on par with models four times its size. We release the model (base, instructed, and chat) along with the datasets created for its training

arXiv.org e-Print Archive

What Language Model to Train if You Have One Million GPU Hours?

Author: Bari M Saiful
Bekman Stas
Beltagy Iz
Biderman Stella
Elsahar Hady
Hesslow Daniel
Launay Julien
Muennighoff Niklas
Phang Jason
Press Ofir
Raffel Colin
Sanh Victor
Saulnier Lucile
Scao Teven Le
Shen Sheng
Sutawika Lintang
Tae Jaesung
Wang Thomas
Yong Zheng Xin
Publication venue
Publication date: 07/11/2022
Field of study

The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations can transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale alone. In the process of building BLOOM--the Big Science Large Open-science Open-access Multilingual language model--our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study at the billion-parameter scale comparing different modeling practices and their impact on zero-shot generalization. In addition, we study the impact of various popular pre-training corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to choose the target model size, shape, and training setup. All our models and code are open-sourced at https://huggingface.co/bigscience .Comment: Findings of EMNLP 202

arXiv.org e-Print Archive

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

Author: Al-shaibani Maged S.
Almubarak Khalid
Alyafeai Zaid
Bach Stephen H.
Bari M Saiful
Ben-David Srulik
Chhablani Gunjan
Dey Manan
Fevry Thibault
Fries Jason Alan
Jiang Mike Tian-Jian
Kim Taewoon
Nayak Nihal V.
Radev Dragomir
Raffel Colin
Rush Alexander M.
Sanh Victor
Santilli Andrea
Sharma Abheesht
Sharma Shanya
Sun Zhiqing
Tang Xiangru
Thakker Urmish
Wang Han
Webson Albert
Xu Canwen
Yong Zheng-Xin
Publication venue
Publication date: 01/01/2022
Field of study

PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from a dataset to a natural language input and target output. Using prompts to train and query language models is an emerging area in NLP that requires new tools that let users develop and refine these prompts collaboratively. PromptSource addresses the emergent challenges in this new setting with (1) a templating language for defining data-linked prompts, (2) an interface that lets users quickly iterate on prompt development by observing outputs of their prompts on many examples, and (3) a community-driven set of guidelines for contributing new prompts to a common pool. Over 2,000 prompts for roughly 170 datasets are already available in PromptSource. PromptSource is available at https://github.com/bigscience-workshop/promptsource.Comment: ACL 2022 Dem

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza