Search CORE

15 research outputs found

MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

Author: Bayazit Deniz
Bonnet Antoine
Bosselut Antoine
Cano Alejandro Hernández
Chen Zeming
Fan Simin
Hartley Mary-Anne
Jaggi Martin
Krawczuk Igor
Köpf Andreas
Marmet Axel
Matoba Kyle
Mohtashami Amirkeivan
Montariol Syrielle
Pagliardini Matteo
Romanou Angelika
Sakhaeirad Alireza
Sallinen Alexandre
Salvi Francesco
Swamy Vinitra
Publication venue
Publication date: 27/11/2023
Field of study

Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs

arXiv.org e-Print Archive

Exact Preimages of Neural Network Aircraft Collision Avoidance Systems

Author: Fleuret Francois
Matoba Kyle
Publication venue
Publication date: 13/04/2021
Field of study

A common pattern of progress in engineering has seen deep neural networks displacing human-designed logic. There are many advantages to this approach, divorcing decisionmaking from human oversight and intuition has costs as well. One is that deep neural networks can map similar inputs to very different outputs in a way that makes their application to safety-critical problem problematic. We present a method to check that the decisions of a deep neural network are as intended by constructing the exact preimage of its predictions. Preimages generalize verification in the sense that they can be used to verify a wide class of properties, and answer much richer questions besides. We examine the functioning of an aircraft collision avoidance system, and show how exact preimages reduce undue conservatism when examining dynamic safety. Our method iterates backwards through the layers of piecewise linear deep neural networks. Uniquely, we compute \emph{all} intermediate values that correspond to a prediction, propagating this calculation through layers using analytical formulae for layer preimages

Infoscience - École polytechnique fédérale de Lausanne

Growth accounting, potential output, and the current recession

Author: John Fernald
Kyle Matoba
Publication venue
Publication date
Field of study

Total factor productivity - a measure of the efficiency with which labor and capital are used - has fallen during the current recession. But, after adjustment for lower utilization of labor and capital, such productivity has risen strongly over the past two years. These growth-accounting measures suggest that efficiency gains have continued during the recession, boding well for long-term economic growth.Productivity ; Labor productivity ; Capital investments

Research Papers in Economics

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Author: Lindner David
Matoba Kyle
Meulemans Alexander
Publication venue: n.n.
Publication date: 01/01/2022
Field of study

Designing reward functions for reinforcement learning is difficult: besides specifying which behavior is rewarded for a task, the reward also has to discourage undesired outcomes. Misspecified reward functions can lead to unintended negative side effects, and overall unsafe behavior. To overcome this problem, recent work proposed to augment the specified reward function with an impact regularizer that discourages behavior that has a big impact on the environment. Although initial results with impact regularizers seem promising in mitigating some types of side effects, important challenges remain. In this paper, we examine the main current challenges of impact regularizers and relate them to fundamental design decisions. We discuss in detail which challenges recent approaches address and which remain unsolved. Finally, we explore promising directions to overcome the unsolved challenges in preventing negative side effects with impact regularizers

Repository for Publications and Research Data

Challenges for Using Impact Regularizers to Avoid Negative Side Effects

Author: Lindner David
Matoba Kyle
Meulemans Alexander
Publication venue: IDIAP
Publication date: 08/02/2021
Field of study

arXiv.org e-Print Archive

ZORA

Flatten the Curve: Efficiently Training Low-Curvature Neural Networks

Author: Fleuret Francois
Lakkaraju Himabindu
Matoba Kyle
Srinivas Suraj
Publication venue
Publication date: 14/06/2022
Field of study

The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of non-linearity. Using this, we demonstrate low-curvature neural networks (LCNNs) that obtain drastically lower curvature than standard models while exhibiting similar predictive performance, which leads to improved robustness and stable gradients, with only a marginally increased training time. To achieve this, we minimize a data-independent upper bound on the curvature of a neural network, which decomposes overall curvature in terms of curvatures and slopes of its constituent layers. To efficiently minimize this bound, we introduce two novel architectural components: first, a non-linearity called centered-softplus that is a stable variant of the softplus non-linearity, and second, a Lipschitz-constrained batch normalization layer. Our experiments show that LCNNs have lower curvature, more stable gradients and increased off-the-shelf adversarial robustness when compared to their standard high-curvature counterparts, all without affecting predictive performance. Our approach is easy to use and can be readily incorporated into existing neural network models

arXiv.org e-Print Archive

Monoclonal gammopathy of “ocular” significance

Author: Aragona
Garibaldi
Graichen
Green
Kyle
Leung
Lisch
Lisch
Matoba
Milman
Ormerod
Rajkumar
Steuhl
Willrich
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref