Search CORE

48 research outputs found

How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds.

Author: Ammanabrolu P
Li M
Rocktäschel T
Szlam A
Urbanek J
Weston J
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2021
Field of study

We seek to create agents that both act and communicate with other agents in pursuit of a goal. Towards this end, we extend LIGHT (Urbanek et al. 2019)—a large-scale crowd-sourced fantasy text-game—with a dataset of quests. These contain natural language motivations paired with in-game goals and human demonstrations; completing a quest might require dialogue or actions (or both). We introduce a reinforcement learning system that (1) incorporates large-scale language modeling-based and commonsense reasoning-based pre-training to imbue the agent with relevant priors; and (2) leverages a factorized action space of action commands and dialogue, balancing between the two. We conduct zero-shot evaluations using held-out human expert demonstrations, showing that our agents are able to act consistently and talk naturally with respect to their motivations

UCL Discovery

Diffusion Interpretation of Nonlocal Neighborhood Filters for Signal Denoising

Author: Amit Singer
Boaz Nadler
Szlam A. D.
Yoel Shkolnisky
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Haemodilution-induced profibrinolytic state is mitigated by fresh-frozen plasma: implications for early haemostatic intervention in massive haemorrhage

Author: Bolliger D.
Levy J. H.
Molinaro R. J.
Szlam F.
Tanaka K. A.
Publication venue
Publication date: 02/08/2017
Field of study

Background Fibrinolysis contributes to coagulopathy after major trauma and surgery. We hypothesized that progressive haemodilution is responsible, at least in part, for increased fibrinolytic tendency of blood clot. Methods The study was performed in two parts. First, whole blood (WB) samples collected from six healthy, consented volunteers were diluted in vitro with either saline or fresh-frozen plasma (FFP) to 40% and 15% of baseline. We quantified factor levels related to coagulation and fibrinolysis, and measured endogenous thrombin generation in undiluted control plasma samples and in samples diluted with saline or FFP. Additionally, thromboelastometry was used to assess susceptibility to fibrinolysis after adding tissue plasminogen activator in undiluted WB samples and in samples diluted with saline before and after substitution of fibrinogen or FFP. Secondly, as a model of in vivo haemodilution, we evaluated the same parameters before and after operation in nine consented patients undergoing off-pump coronary artery bypass surgery. Results The dilution with saline caused dose-dependent decreases in plasma levels of coagulation and antifibrinolytic factors, and in thrombin generation. In FFP-supplemented samples, factor levels and thrombin generation were maintained within normal ranges. Fibrinolytic tendency was significantly higher after haemodilution with saline independent of fibrinogen substitution compared with FFP. Similarly, increased tendency for fibrinolysis was also observed in the in vivo haemodilution. Conclusions We demonstrated in vitro and in vivo that progressive haemodilution decreases endogenous antifibrinolytic proteins including α2-antiplasmin and thrombin-activatable fibrinolysis inhibitor, resulting in increased fibrinolytic tendency. Therefore, early fluid replacement therapy with FFP might be advantageous after massive haemorrhag

RERO DOC Digital Library

Diffusion methods for wind power ramp detection

Author: A. Bossavy
A. Singer
A. Szlam
B. Greaves
H. Zheng
M. Belkin
P. Baldi
R. Coifman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-38679-4_9Proceedings of 12th International Work-Conference on Artificial Neural Networks, IWANN 2013, Puerto de la Cruz, Tenerife, Spain, June 12-14, 2013, Part IThe prediction and management of wind power ramps is currently receiving large attention as it is a crucial issue for both system operators and wind farm managers. However, this is still an issue far from being solved and in this work we will address it as a classification problem working with delay vectors of the wind power time series and applying local Mahalanobis K-NN search with metrics derived from Anisotropic Diffusion methods. The resulting procedures clearly outperform a random baseline method and yield good sensitivity but more work is needed to improve on specificity and, hence, precision.With partial support from Spain's grant TIN2010-21575- C02-01 and the UAM-ADIC Chair for Machine Learning. The rst author is also supported by an FPI-UAM grant and kindly thanks the Applied Mathematics Department of Yale University for receiving her during her visits. The second author is supported by the FPU-MEC grant AP2008-00167

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Multiclass Semi-Supervised Learning on Graphs using Ginzburg-Landau Functional Minimization

Author: A Bertozzi
A Bertozzi
A Subramanya
AD Szlam
AL Bertozzi
D Zhou
EL Allwein
G Gilboa
GE Hinton
JA Dobrosotskaya
JA Dobrosotskaya
L Zelnik-Manor
RR Coifman
RV Kohn
TG Dietterich
Y LeCun
Y Li
YM Jung
Publication venue
Publication date: 06/06/2013
Field of study

We present a graph-based variational algorithm for classification of high-dimensional data, generalizing the binary diffuse interface model to the case of multiple classes. Motivated by total variation techniques, the method involves minimizing an energy functional made up of three terms. The first two terms promote a stepwise continuous classification function with sharp transitions between classes, while preserving symmetry among the class labels. The third term is a data fidelity term, allowing us to incorporate prior information into the model in a semi-supervised framework. The performance of the algorithm on synthetic data, as well as on the COIL and MNIST benchmark datasets, is competitive with state-of-the-art graph-based multiclass segmentation methods.Comment: 16 pages, to appear in Springer's Lecture Notes in Computer Science volume "Pattern Recognition Applications and Methods 2013", part of series on Advances in Intelligent and Soft Computin

arXiv.org e-Print Archive

Crossref

Asynchronous Local-SGD Training for Language Modeling

Author: Chhaparia Rachita
Douillard Arthur
Kale Satyen
Liu Bo
Ranzato Marc'Aurelio
Rusu Andrei A.
Shen Jiajun
Szlam Arthur
Publication venue
Publication date: 17/01/2024
Field of study

Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We conduct a comprehensive investigation by examining how worker hardware heterogeneity, model size, number of workers, and optimizer could impact the learning performance. We find that with naive implementations, asynchronous Local-SGD takes more iterations to converge than its synchronous counterpart despite updating the (global) model parameters more frequently. We identify momentum acceleration on the global parameters when worker gradients are stale as a key challenge. We propose a novel method that utilizes a delayed Nesterov momentum update and adjusts the workers' local training steps based on their computation speed. This approach, evaluated with models up to 150M parameters on the C4 dataset, matches the performance of synchronous Local-SGD in terms of perplexity per update step, and significantly surpasses it in terms of wall clock time

arXiv.org e-Print Archive

DiLoCo: Distributed Low-Communication Training of Language Models

Author: Chhaparia Rachita
Donchev Yani
Douillard Arthur
Feng Qixuan
Kuncoro Adhiguna
Ranzato Marc'Aurelio
Rusu Andrei A.
Shen Jiajun
Szlam Arthur
Publication venue
Publication date: 02/12/2023
Field of study

Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accelerators, with devices exchanging gradients and other intermediate states at each optimization step. While it is difficult to build and maintain a single computing cluster hosting many accelerators, it might be easier to find several computing clusters each hosting a smaller number of devices. In this work, we propose a distributed optimization algorithm, Distributed Low-Communication (DiLoCo), that enables training of language models on islands of devices that are poorly connected. The approach is a variant of federated averaging, where the number of inner steps is large, the inner optimizer is AdamW, and the outer optimizer is Nesterov momentum. On the widely used C4 dataset, we show that DiLoCo on 8 workers performs as well as fully synchronous optimization while communicating 500 times less. DiLoCo exhibits great robustness to the data distribution of each worker. It is also robust to resources becoming unavailable over time, and vice versa, it can seamlessly leverage resources that become available during training

arXiv.org e-Print Archive

Regularized Linear Inversion with Randomized Singular Value Decomposition

Author: A Frieze
A Neubauer
A Szlam
B Jin
E Somersalo
GW Stewart
J Wang
L Eldén
L Zhang
M Griebel
M Gu
N Halko
R Witten
U Tautenhahn
Y Wei
Publication venue
Publication date: 04/09/2019
Field of study

In this work, we develop efficient solvers for linear inverse problems based on randomized singular value decomposition (RSVD). This is achieved by combining RSVD with classical regularization methods, e.g., truncated singular value decomposition, Tikhonov regularization, and general Tikhonov regularization with a smoothness penalty. One distinct feature of the proposed approach is that it explicitly preserves the structure of the regularized solution in the sense that it always lies in the range of a certain adjoint operator. We provide error estimates between the approximation and the exact solution under canonical source condition, and interpret the approach in the lens of convex duality. Extensive numerical experiments are provided to illustrate the efficiency and accuracy of the approach.Comment: 20 pages, 4 figure

arXiv.org e-Print Archive

Crossref

UCL Discovery

Target Detection Performance Bounds in Compressive Imaging

Author: A Szlam
A Wagadarikar
AO Hero
C Chang
C Chang
C Stellman
D Achlioptas
D Brady
D Donoho
D Manolakis
D Manolakis
D Stein
D Takhar
E Arias-Castro
E Kelly
EJ Candès
F Krahmer
F Woolfe
FA Kruse
G Healey
G Wei
H Kwon
H Kwon
I Reed
I Steinwart
J Fowler
J Han
J Haupt
J Miller
J Storey
JO Berger
JW Boardman
K Krishnamurthy
K Krishnamurthy
K Zuzak
KR Davidson
L Scharf
L Wasserman
LL Scharf
M Davenport
M Gehm
M Martin
M Martin
M Parmar
MF Duarte
R Baraniuk
R Lin
R Willett
RA DeVerse
S Aeron
S Kraut
T Tao
W Johnson
W Johnson
X Jin
Y Benjamini
Z Guo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This paper describes computationally efficient approaches and associated theoretical performance guarantees for the detection of known targets and anomalies from few projection measurements of the underlying signals. The proposed approaches accommodate signals of different strengths contaminated by a colored Gaussian background, and perform detection without reconstructing the underlying signals from the observations. The theoretical performance bounds of the target detector highlight fundamental tradeoffs among the number of measurements collected, amount of background signal present, signal-to-noise ratio, and similarity among potential targets coming from a known dictionary. The anomaly detector is designed to control the number of false discoveries. The proposed approach does not depend on a known sparse representation of targets; rather, the theoretical performance bounds exploit the structure of a known dictionary of targets and the distance preservation property of the measurement matrix. Simulation experiments illustrate the practicality and effectiveness of the proposed approaches.Comment: Submitted to the EURASIP Journal on Advances in Signal Processin

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Differential Contributions of Intrinsic and Extrinsic Pathways to Thrombin Generation in Adult, Maternal and Cord Plasma Samples.

Author: Arthur D Szlam
Fania Szlam
Jeffrey D Varner
Kenichi A Tanaka
Nicklaus T Rice
Peter S Bernstein
Publication venue: Public Library of Science (PLoS)
Publication date: 19/05/2016
Field of study

BACKGROUND:Thrombin generation (TG) is a pivotal process in achieving hemostasis. Coagulation profiles during pregnancy and early neonatal period are different from that of normal (non-pregnant) adults. In this ex vivo study, the differences in TG in maternal and cord plasma relative to normal adult plasma were studied. METHODS:Twenty consented pregnant women and ten consented healthy adults were included in the study. Maternal and cord blood samples were collected at the time of delivery. Platelet-poor plasma was isolated for the measurement of TG. In some samples, anti-FIXa aptamer, RB006, or a TFPI inhibitor, BAX499 were added to elucidate the contribution of intrinsic and extrinsic pathway to TG. Additionally, procoagulant and inhibitor levels were measured in maternal and cord plasma, and these values were used to mathematically simulate TG. RESULTS:Peak TG was increased in maternal plasma (393.6±57.9 nM) compared to adult and cord samples (323.2±38.9 nM and 209.9±29.5 nM, respectively). Inhibitory effects of RB006 on TG were less robust in maternal or cord plasma (52% vs. 12% respectively) than in adult plasma (81%). Likewise the effectiveness of BAX499 as represented by the increase in peak TG was much greater in adult (21%) than in maternal (10%) or cord plasma (12%). Further, BAX499 was more effective in reversing RB006 in adult plasma than in maternal or cord plasma. Ex vivo data were reproducible with the results of the mathematical simulation of TG. CONCLUSION:Normal parturient plasma shows a large intrinsic pathway reserve for TG compared to adult and cord plasma, while TG in cord plasma is sustained by extrinsic pathway, and low levels of TFPI and AT

Directory of Open Access Journals

FigShare