Search CORE

48 research outputs found

Gradient-based Inference for Networks with Output Constraints

Author: Carbonell Jaime
Lee Jay Yoon
Mehta Sanket Vaibhav
Tristan Jean-Baptiste
Wick Michael
Publication venue
Publication date: 22/04/2019
Field of study

Practitioners apply neural networks to increasingly complex problems in natural language processing, such as syntactic parsing and semantic role labeling that have rich output structures. Many such structured-prediction problems require deterministic constraints on the output values; for example, in sequence-to-sequence syntactic parsing, we require that the sequential outputs encode valid trees. While hidden units might capture such properties, the network is not always able to learn such constraints from the training data alone, and practitioners must then resort to post-processing. In this paper, we present an inference method for neural networks that enforces deterministic constraints on outputs without performing rule-based post-processing or expensive discrete search. Instead, in the spirit of gradient-based training, we enforce constraints with gradient-based inference (GBI): for each input at test-time, we nudge continuous model weights until the network's unconstrained inference procedure generates an output that satisfies the constraints. We study the efficacy of GBI on three tasks with hard constraints: semantic role labeling, syntactic parsing, and sequence transduction. In each case, the algorithm not only satisfies constraints but improves accuracy, even when the underlying network is state-of-the-art.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Towards Semi-Supervised Learning for Deep Semantic Role Labeling

Author: Carbonell Jaime
Lee Jay Yoon
Mehta Sanket Vaibhav
Publication venue
Publication date: 01/01/2018
Field of study

Neural models have shown several state-of-the-art performances on Semantic Role Labeling (SRL). However, the neural models require an immense amount of semantic-role corpora and are thus not well suited for low-resource languages or domains. The paper proposes a semi-supervised semantic role labeling method that outperforms the state-of-the-art in limited SRL training corpora. The method is based on explicitly enforcing syntactic constraints by augmenting the training objective with a syntactic-inconsistency loss component and uses SRL-unlabeled instances to train a joint-objective LSTM. On CoNLL-2012 English section, the proposed semi-supervised training with 1%, 10% SRL-labeled data and varying amounts of SRL-unlabeled data achieves +1.58, +0.78 F1, respectively, over the pre-trained models that were trained on SOTA architecture with ELMo on the same SRL-labeled data. Additionally, by using the syntactic-inconsistency loss on inference time, the proposed model achieves +3.67, +2.1 F1 over pre-trained model on 1%, 10% SRL-labeled data, respectively.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Assessment of clinical and functional outcomes after single dose injection of autologous platelet rich plasma in patients with chronic lateral epicondylitis: a prospective and brief follow up study

Author: Bajaj Sanket
Garg Rohit N.
Mehta Nirali
Patil Hrishikesh
Publication venue: Medip Academy
Publication date: 26/10/2023
Field of study

Background: Lateral epicondylitis is a chronic, painful, and debilitating elbow condition. The introduction of platelet-rich plasma as an adjunct to the conservative and operative treatment has revolutionized the research in this topic. PRP is considered to be the ideal autologous biological blood-derived product which helps in regenerating the degenerated tissue rather than just repairing it and helps in relieving pain and improving function. Methods: This is a prospective study where 40 patients diagnosed with tennis elbow, failing other conservative treatment modalities were enrolled; and treated with single dose injection of autologous PRP; and were evaluated for clinical and functional outcomes using the visual analogue scale and disabilities of arm, shoulder, and hand scores on the follow-ups. Results: Out of the 40 patients enrolled, there were 15 males and 25 females. The mean age of the population was 45.88±8.87 years. All the patients had improved statistically significant differences in mean VAS and DASH scores (p value<0.001) on each follow-up as compared to the baseline score with VAS score and DASH score improvement being more than 77% and 65% respectively at final follow up. Conclusion: Our study concludes that a single local injection of autologous PRP appears to be the promising and safe modality of treatment in lateral epicondylitis, helping to improve the pain as well as the clinical and functional outcomes

International Journal of Research in Orthopaedics

An Introduction to Lifelong Supervised Learning

Author: Abdelsalam Mohamed
Chandar Sarath
Faramarzi Mojtaba
Janarthanan Janarthanan
Malviya Pranshu
Mehta Sanket Vaibhav
Sodhani Shagun
Publication venue
Publication date: 12/07/2022
Field of study

This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7.Comment: Lifelong Learning Prime

arXiv.org e-Print Archive

Making Scalable Meta Learning Practical

Author: Ahn Hwijeen
Choe Sang Keun
Mehta Sanket Vaibhav
Neiswanger Willie
Strubell Emma
Xie Pengtao
Xing Eric
Publication venue
Publication date: 23/10/2023
Field of study

Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains

arXiv.org e-Print Archive

DSI++: Updating Transformer Memory with New Documents

Author: Dehghani Mostafa
Gupta Jai
Mehta Sanket Vaibhav
Metzler Donald
Najork Marc
Rao Jinfeng
Strubell Emma
Tay Yi
Tran Vinh Q.
Publication venue
Publication date: 08/12/2023
Field of study

Differentiable Search Indices (DSIs) encode a corpus of documents in model parameters and use the same model to answer user queries directly. Despite the strong performance of DSI models, deploying them in situations where the corpus changes over time is computationally expensive because reindexing the corpus requires re-training the model. In this work, we introduce DSI++, a continual learning challenge for DSI to incrementally index new documents while being able to answer queries related to both previously and newly indexed documents. Across different model scales and document identifier representations, we show that continual indexing of new documents leads to considerable forgetting of previously indexed documents. We also hypothesize and verify that the model experiences forgetting events during training, leading to unstable learning. To mitigate these issues, we investigate two approaches. The first focuses on modifying the training dynamics. Flatter minima implicitly alleviate forgetting, so we optimize for flatter loss basins and show that the model stably memorizes more documents (

+12\%

). Next, we introduce a generative memory to sample pseudo-queries for documents and supplement them during continual indexing to prevent forgetting for the retrieval task. Extensive experiments on novel continual indexing benchmarks based on Natural Questions (NQ) and MS MARCO demonstrate that our proposed solution mitigates forgetting significantly. Concretely, it improves the average Hits@10 by

+21.1\%

over competitive baselines for NQ and requires

6

times fewer model updates compared to re-training the DSI model for incrementally indexing five corpora in a sequence.Comment: Accepted at EMNLP 2023 main conferenc

arXiv.org e-Print Archive