Search CORE

21 research outputs found

Binary and Ternary Natural Language Generation

Author: Krishnamoorthi Raghuraman
Liu Zechun
Oguz Barlas
Pappu Aasish
Shi Yangyang
Publication venue
Publication date: 02/06/2023
Field of study

Ternary and binary neural networks enable multiplication-free computation and promise multiple orders of magnitude efficiency gains over full-precision networks if implemented on specialized hardware. However, since both the parameter and the output space are highly discretized, such networks have proven very difficult to optimize. The difficulties are compounded for the class of transformer text generation models due to the sensitivity of the attention operation to quantization and the noise-compounding effects of autoregressive decoding in the high-cardinality output space. We approach the problem with a mix of statistics-based quantization for the weights and elastic quantization of the activations and demonstrate the first ternary and binary transformer models on the downstream tasks of summarization and machine translation. Our ternary BART base achieves an R1 score of 41 on the CNN/DailyMail benchmark, which is merely 3.9 points behind the full model while being 16x more efficient. Our binary model, while less accurate, achieves a highly non-trivial score of 35.6. For machine translation, we achieved BLEU scores of 21.7 and 17.6 on the WMT16 En-Ro benchmark, compared with a full precision mBART model score of 26.8. We also compare our approach in the 8-bit activation setting, where our ternary and even binary weight models can match or outperform the best existing 8-bit weight models in the literature. Our code and models are available at: https://github.com/facebookresearch/Ternary_Binary_TransformerComment: ACL 2023 Ora

arXiv.org e-Print Archive

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Author: Gee James C.
Nie Yixin
Oguz Barlas
Wu Yifan
Xiong Wenhan
Zhang Pengchuan
Publication venue
Publication date: 15/11/2023
Field of study

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning. We present the "Description then Decision" strategy, which is inspired by how humans process signals. This strategy significantly improves probing task performance by 50%, establishing the groundwork for future research on reasoning paradigms in complex vision-language tasks

arXiv.org e-Print Archive

UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering

Author: Chen Xilun
Gupta Sonal
Karpukhin Vladimir
Mehdad Yashar
Oguz Barlas
Okhonko Dmytro
Peshterliev Stan
Schlichtkrull Michael
Yih Scott
Publication venue
Publication date: 19/07/2021
Field of study

We study open-domain question answering with structured, unstructured and semi-structured knowledge sources, including text, tables, lists and knowledge bases. Departing from prior work, we propose a unifying approach that homogenizes all sources by reducing them to text and applies the retriever-reader model which has so far been limited to text sources only. Our approach greatly improves the results on knowledge-base QA tasks by 11 points, compared to latest graph-based methods. More importantly, we demonstrate that our unified knowledge (UniK-QA) model is a simple and yet effective way to combine heterogeneous sources of knowledge, advancing the state-of-the-art results on two popular question answering benchmarks, NaturalQuestions and WebQuestions, by 3.5 and 2.6 points, respectively

arXiv.org e-Print Archive

A Study on the Efficiency and Generalization of Light Hybrid Retrievers

Author: Baral Chitta
Chatterjee Debojeet
Chen Xilun
Einolghozati Arash
Gupta Anchit
Heidari Peyman
Jain Shashank
Luo Man
Oguz Barlas
Publication venue
Publication date: 04/10/2022
Field of study

Existing hybrid retrievers which integrate sparse and dense retrievers, are indexing-heavy, limiting their applicability in real-world on-devices settings. We ask the question "Is it possible to reduce the indexing memory of hybrid retrievers without sacrificing performance?" Driven by this question, we leverage an indexing-efficient dense retriever (i.e. DrBoost) to obtain a light hybrid retriever. Moreover, to further reduce the memory, we introduce a lighter dense retriever (LITE) which is jointly trained on contrastive learning and knowledge distillation from DrBoost. Compared to previous heavy hybrid retrievers, our Hybrid-LITE retriever saves 13 memory while maintaining 98.0 performance. In addition, we study the generalization of light hybrid retrievers along two dimensions, out-of-domain (OOD) generalization and robustness against adversarial attacks. We evaluate models on two existing OOD benchmarks and create six adversarial attack sets for robustness evaluation. Experiments show that our light hybrid retrievers achieve better robustness performance than both sparse and dense retrievers. Nevertheless there is a large room to improve the robustness of retrievers, and our datasets can aid future research

arXiv.org e-Print Archive

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Author: Asai Akari
Chen Xilun
Li Minghan
Lin Jimmy
Lin Sheng-Chieh
Mehdad Yashar
Oguz Barlas
Yih Wen-tau
Publication venue
Publication date: 14/02/2023
Field of study

Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. In particular, we systematically examine the contrastive learning of DRs, under the framework of Data Augmentation (DA). Our study shows that common DA practices such as query augmentation with generative models and pseudo-relevance label creation using a cross-encoder, are often inefficient and sub-optimal. We hence propose a new DA approach with diverse queries and sources of supervision to progressively train a generalizable DR. As a result, DRAGON, our dense retriever trained with diverse augmentation, is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations and even competes with models using more complex late interaction (ColBERTv2 and SPLADE++)

arXiv.org e-Print Archive

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models

Author: Chandra Vikas
Chang Ernie
Krishnamoorthi Raghuraman
Liu Zechun
Mehdad Yashar
Oguz Barlas
Shi Yangyang
Stock Pierre
Zhao Changsheng
Publication venue
Publication date: 29/05/2023
Field of study

Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits. We find that these methods break down at lower bit precision, and investigate quantization aware training for LLMs (LLM-QAT) to push quantization levels even further. We propose a data-free distillation method that leverages generations produced by the pre-trained model, which better preserves the original output distribution and allows quantizing any generative model independent of its training data, similar to post-training quantization methods. In addition to quantizing weights and activations, we also quantize the KV cache, which is critical for increasing throughput and support long sequence dependencies at current model sizes. We experiment with LLaMA models of sizes 7B, 13B, and 30B, at quantization levels down to 4-bits. We observe large improvements over training-free methods, especially in the low-bit settings

arXiv.org e-Print Archive

Effective Long-Context Scaling of Foundation Models

Author: Bhargava Prajjwal
Bhosale Shruti
Edunov Sergey
Fan Angela
Fang Han
Hou Rui
Khabsa Madian
Lewis Mike
Liu Jingyu
Ma Hao
Malik Kshitiz
Martin Louis
Mehdad Yashar
Molybog Igor
Narang Sharan
Oguz Barlas
Rungta Rashi
Sankararaman Karthik Abinav
Wang Sinong
Xiong Wenhan
Zhang Hejia
Publication venue
Publication date: 13/11/2023
Field of study

We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchmarks, our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2. Notably, with a cost-effective instruction tuning procedure that does not require human-annotated long instruction data, the 70B variant can already surpass gpt-3.5-turbo-16k's overall performance on a suite of long-context tasks. Alongside these results, we provide an in-depth analysis on the individual components of our method. We delve into Llama's position encodings and discuss its limitation in modeling long dependencies. We also examine the impact of various design choices in the pretraining process, including the data mix and the training curriculum of sequence lengths -- our ablation experiments suggest that having abundant long texts in the pretrain dataset is not the key to achieving strong performance, and we empirically verify that long context continual pretraining is more efficient and similarly effective compared to pretraining from scratch with long sequences

arXiv.org e-Print Archive