11 research outputs found

    Data Selection for Generalization in Unimodal & Multimodal Models

    Get PDF
    In this thesis, I present research on improving datasets for deep learning models with automateddata transformation methods. The immediate goal of this work is to maximize the in-domainperformance, out-of-domain generalization, and robustness of models trained on the transformeddatasets. The broader goal of this research is to expand our understanding of how training dataimpacts deep learning models. The data transformation methods discussed in this work can beclassified into data augmentations, data order, and data subset selection.First, I present work on data augmentation methods that improve the robustness of deep learningmodels. I demonstrate the vulnerability of reading comprehension models to a series of novel adversarial attacks and present a policy search method to add optimized proportions of these adversarialattacks to the training data. It improves the in-domain, cross-domain, and cross-lingual generalizationof the model. Then, I expose the phenomena of cross-task inconsistency in multi-task multimodalmodels and show that automatically generated contrast sets can be used to make the model consistent.Second, I explore the efficacy of curriculum learning for finetuning language models on commonsense reasoning tasks. I experiment paced curriculum strategies using a variety of scoringfunctions for quantifying the difficulty of a sample and find that a hard-to-easy curriculum promotesout-of-domain generalization in such models.Third, I discuss the importance of jointly considering diversity and sample difficulty for datasubset selection in the pretraining, fine-tuning, and continual learning paradigms. I propose a scalablestate-of-the-art graph-based algorithm for combining the two factors during the pruning of pretrainingand fine-tuning datasets across data modalities. Further, I propose a multi-way pruning algorithm forselecting training data that contains a balanced mixture of seen vs. unseen tasks and frequent vs. raretasks at each time step during continual instruction tuning of multimodal large language models.In summary, I present several automated data transformation methods spanning augmentation,ordering, and selection for improving the performance of the models trained on the transformeddatasets along various axes.Doctor of Philosoph

    Debiasing Multimodal Models via Causal Information Minimization

    Full text link
    Most existing debiasing methods for multimodal models, including causal intervention and inference methods, utilize approximate heuristics to represent the biases, such as shallow features from early stages of training or unimodal features for multimodal tasks like VQA, etc., which may not be accurate. In this paper, we study bias arising from confounders in a causal graph for multimodal data and examine a novel approach that leverages causally-motivated information minimization to learn the confounder representations. Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data. Hence, minimizing the information content of features obtained from a pretrained biased model helps learn the simplest predictive features that capture the underlying data distribution. We treat these features as confounder representations and use them via methods motivated by causal theory to remove bias from models. We find that the learned confounder representations indeed capture dataset biases, and the proposed debiasing methods improve out-of-distribution (OOD) performance on multiple multimodal datasets without sacrificing in-distribution performance. Additionally, we introduce a novel metric to quantify the sufficiency of spurious features in models' predictions that further demonstrates the effectiveness of our proposed methods. Our code is available at: https://github.com/Vaidehi99/CausalInfoMinComment: EMNLP 2023 Findings (16 pages

    Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models

    Full text link
    As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support. Inconsistent AI models are considered brittle and untrustworthy by human users and are more challenging to incorporate into larger systems that take dependencies on their outputs. Measuring consistency between very heterogeneous tasks that might include outputs in different modalities is challenging since it is difficult to determine if the predictions are consistent with one another. As a solution, we introduce a benchmark dataset, COCOCON, where we use contrast sets created by modifying test instances for multiple tasks in small but semantically meaningful ways to change the gold label, and outline metrics for measuring if a model is consistent by ranking the original and perturbed instances across tasks. We find that state-of-the-art systems suffer from a surprisingly high degree of inconsistent behavior across tasks, especially for more heterogeneous tasks. Finally, we propose using a rank correlation-based auxiliary objective computed over large automatically created cross-task contrast sets to improve the multi-task consistency of large unified models, while retaining their original accuracy on downstream tasks. Project website available at https://adymaharana.github.io/cococon/Comment: Project Website: https://adymaharana.github.io/cococon

    Evaluating Very Long-Term Conversational Memory of LLM Agents

    Full text link
    Existing works on long-term open-domain dialogues focus on evaluating model responses within contexts spanning no more than five chat sessions. Despite advancements in long-context large language models (LLMs) and retrieval augmented generation (RAG) techniques, their efficacy in very long-term dialogues remains unexplored. To address this research gap, we introduce a machine-human pipeline to generate high-quality, very long-term dialogues by leveraging LLM-based agent architectures and grounding their dialogues on personas and temporal event graphs. Moreover, we equip each agent with the capability of sharing and reacting to images. The generated conversations are verified and edited by human annotators for long-range consistency and grounding to the event graphs. Using this pipeline, we collect LoCoMo, a dataset of very long-term conversations, each encompassing 300 turns and 9K tokens on avg., over up to 35 sessions. Based on LoCoMo, we present a comprehensive evaluation benchmark to measure long-term memory in models, encompassing question answering, event summarization, and multi-modal dialogue generation tasks. Our experimental results indicate that LLMs exhibit challenges in understanding lengthy conversations and comprehending long-range temporal and causal dynamics within dialogues. Employing strategies like long-context LLMs or RAG can offer improvements but these models still substantially lag behind human performance.Comment: 19 pages; Project page: https://snap-research.github.io/locomo

    Extraction of Clinical Timeline from Discharge Summaries using Neural Networks

    No full text
    Thesis (Master's)--University of Washington, 2017-12Discharge summaries are a concise representation of the most important bits of information about a patient’s time in the hospital. Converting the free-text into a clinical timeline can facilitate accurate assimilation of information by physicians and the structured data can be used to populate knowledge bases, in clinical decision support systems, etc. Conventional methods for temporal evaluation of discharge summaries employ structured inference and extensive feature engineering. However, they also run the risk of overfitting to the training domain and thus, not being efficient in deployment. Novel methods of natural language processing leverage semantics from large corpuses and produce results with minimum feature engineering. This work explores the use of neural network architectures in clinical entity recognition and temporal evaluation. Recurrent neural networks are found to perform at par with conditional random field systems in clinical entity recognition, scoring 94.04% on the i2b2 2012 dataset. Moreover, they perform better for under-represented entity classes like ‘Occurrence’, ‘Evidential’ and ‘Clinical Department’ in a skewed dataset. The out-of-domain evaluation of conditional random fields and neural networks has favorable results on a corpus of ER visit, progress, consult and ICU notes from various medical centers. Neural networks are more agreeable to domain adaptation. This work also explores the use of convolutional neural nets for extraction of within-sentence temporal relations. Preliminary results show that convolutional networks might not be well suited to the task

    Mapping disparities in homicide trends across Brazil: 2000–2014

    Full text link
    Abstract Background Homicides are a major problem in Brazil. Drugs and arms trafficking, and land conflicts are three of the many factors driving homicide rates in Brazil. Understanding long-term spatiotemporal trends and social structural factors associated with homicides in Brazil would be useful for designing policies aimed at reducing homicide rates. Methods We obtained data from 2000 to 2014 from the Brazil Ministry of Health (MOH) Mortality Information System and sociodemographic data from the Brazil Institute of Geography and Statistics (IBGE). First, we quantified the rate of change in homicides at the municipality and state levels. Second, we used principal component regression and k-medoids clustering to examine differences in temporal trends across municipalities. Lastly, we used Bayesian hierarchical space-time models to describe spatio-temporal patterns and to assess the contribution of structural factors. Results There were significant variations in homicide rates across states and municipalities. We noted the largest decrease in homicide rates in the western and southeastern states of Sao Paulo, Rio de Janeiro and Espirito Santo, which coincided with an increase in homicide rates in the northeastern states of Ceará, Alagoas, Paraiba, Rio Grande Norte, Sergipe and Bahia during the fifteen-year period. The decrease in homicides in municipalities with populations of at least 250,000 coincided with an increase in municipalities with 25,000 people or less. Structural factors that predicted municipality-level homicide rates included crude domestic product, urbanization, border with neighboring countries and proportion of population aged fifteen to twenty-nine. Conclusions Our findings support both a dissemination hypothesis and an interiorization hypothesis. These findings should be considered when designing interventions to curb homicide rates.http://deepblue.lib.umich.edu/bitstream/2027.42/174011/1/40621_2020_Article_273.pd
    corecore