11 research outputs found
Data Selection for Generalization in Unimodal & Multimodal Models
In this thesis, I present research on improving datasets for deep learning models with automateddata transformation methods. The immediate goal of this work is to maximize the in-domainperformance, out-of-domain generalization, and robustness of models trained on the transformeddatasets. The broader goal of this research is to expand our understanding of how training dataimpacts deep learning models. The data transformation methods discussed in this work can beclassified into data augmentations, data order, and data subset selection.First, I present work on data augmentation methods that improve the robustness of deep learningmodels. I demonstrate the vulnerability of reading comprehension models to a series of novel adversarial attacks and present a policy search method to add optimized proportions of these adversarialattacks to the training data. It improves the in-domain, cross-domain, and cross-lingual generalizationof the model. Then, I expose the phenomena of cross-task inconsistency in multi-task multimodalmodels and show that automatically generated contrast sets can be used to make the model consistent.Second, I explore the efficacy of curriculum learning for finetuning language models on commonsense reasoning tasks. I experiment paced curriculum strategies using a variety of scoringfunctions for quantifying the difficulty of a sample and find that a hard-to-easy curriculum promotesout-of-domain generalization in such models.Third, I discuss the importance of jointly considering diversity and sample difficulty for datasubset selection in the pretraining, fine-tuning, and continual learning paradigms. I propose a scalablestate-of-the-art graph-based algorithm for combining the two factors during the pruning of pretrainingand fine-tuning datasets across data modalities. Further, I propose a multi-way pruning algorithm forselecting training data that contains a balanced mixture of seen vs. unseen tasks and frequent vs. raretasks at each time step during continual instruction tuning of multimodal large language models.In summary, I present several automated data transformation methods spanning augmentation,ordering, and selection for improving the performance of the models trained on the transformeddatasets along various axes.Doctor of Philosoph
Debiasing Multimodal Models via Causal Information Minimization
Most existing debiasing methods for multimodal models, including causal
intervention and inference methods, utilize approximate heuristics to represent
the biases, such as shallow features from early stages of training or unimodal
features for multimodal tasks like VQA, etc., which may not be accurate. In
this paper, we study bias arising from confounders in a causal graph for
multimodal data and examine a novel approach that leverages causally-motivated
information minimization to learn the confounder representations. Robust
predictive features contain diverse information that helps a model generalize
to out-of-distribution data. Hence, minimizing the information content of
features obtained from a pretrained biased model helps learn the simplest
predictive features that capture the underlying data distribution. We treat
these features as confounder representations and use them via methods motivated
by causal theory to remove bias from models. We find that the learned
confounder representations indeed capture dataset biases, and the proposed
debiasing methods improve out-of-distribution (OOD) performance on multiple
multimodal datasets without sacrificing in-distribution performance.
Additionally, we introduce a novel metric to quantify the sufficiency of
spurious features in models' predictions that further demonstrates the
effectiveness of our proposed methods. Our code is available at:
https://github.com/Vaidehi99/CausalInfoMinComment: EMNLP 2023 Findings (16 pages
Exposing and Addressing Cross-Task Inconsistency in Unified Vision-Language Models
As general purpose vision models get increasingly effective at a wide set of
tasks, it is imperative that they be consistent across the tasks they support.
Inconsistent AI models are considered brittle and untrustworthy by human users
and are more challenging to incorporate into larger systems that take
dependencies on their outputs. Measuring consistency between very heterogeneous
tasks that might include outputs in different modalities is challenging since
it is difficult to determine if the predictions are consistent with one
another. As a solution, we introduce a benchmark dataset, COCOCON, where we use
contrast sets created by modifying test instances for multiple tasks in small
but semantically meaningful ways to change the gold label, and outline metrics
for measuring if a model is consistent by ranking the original and perturbed
instances across tasks. We find that state-of-the-art systems suffer from a
surprisingly high degree of inconsistent behavior across tasks, especially for
more heterogeneous tasks. Finally, we propose using a rank correlation-based
auxiliary objective computed over large automatically created cross-task
contrast sets to improve the multi-task consistency of large unified models,
while retaining their original accuracy on downstream tasks. Project website
available at https://adymaharana.github.io/cococon/Comment: Project Website: https://adymaharana.github.io/cococon
Evaluating Very Long-Term Conversational Memory of LLM Agents
Existing works on long-term open-domain dialogues focus on evaluating model
responses within contexts spanning no more than five chat sessions. Despite
advancements in long-context large language models (LLMs) and retrieval
augmented generation (RAG) techniques, their efficacy in very long-term
dialogues remains unexplored. To address this research gap, we introduce a
machine-human pipeline to generate high-quality, very long-term dialogues by
leveraging LLM-based agent architectures and grounding their dialogues on
personas and temporal event graphs. Moreover, we equip each agent with the
capability of sharing and reacting to images. The generated conversations are
verified and edited by human annotators for long-range consistency and
grounding to the event graphs. Using this pipeline, we collect LoCoMo, a
dataset of very long-term conversations, each encompassing 300 turns and 9K
tokens on avg., over up to 35 sessions. Based on LoCoMo, we present a
comprehensive evaluation benchmark to measure long-term memory in models,
encompassing question answering, event summarization, and multi-modal dialogue
generation tasks. Our experimental results indicate that LLMs exhibit
challenges in understanding lengthy conversations and comprehending long-range
temporal and causal dynamics within dialogues. Employing strategies like
long-context LLMs or RAG can offer improvements but these models still
substantially lag behind human performance.Comment: 19 pages; Project page: https://snap-research.github.io/locomo
Extraction of Clinical Timeline from Discharge Summaries using Neural Networks
Thesis (Master's)--University of Washington, 2017-12Discharge summaries are a concise representation of the most important bits of information about a patient’s time in the hospital. Converting the free-text into a clinical timeline can facilitate accurate assimilation of information by physicians and the structured data can be used to populate knowledge bases, in clinical decision support systems, etc. Conventional methods for temporal evaluation of discharge summaries employ structured inference and extensive feature engineering. However, they also run the risk of overfitting to the training domain and thus, not being efficient in deployment. Novel methods of natural language processing leverage semantics from large corpuses and produce results with minimum feature engineering. This work explores the use of neural network architectures in clinical entity recognition and temporal evaluation. Recurrent neural networks are found to perform at par with conditional random field systems in clinical entity recognition, scoring 94.04% on the i2b2 2012 dataset. Moreover, they perform better for under-represented entity classes like ‘Occurrence’, ‘Evidential’ and ‘Clinical Department’ in a skewed dataset. The out-of-domain evaluation of conditional random fields and neural networks has favorable results on a corpus of ER visit, progress, consult and ICU notes from various medical centers. Neural networks are more agreeable to domain adaptation. This work also explores the use of convolutional neural nets for extraction of within-sentence temporal relations. Preliminary results show that convolutional networks might not be well suited to the task
Recommended from our members
Detecting reports of unsafe foods in consumer product reviews.
ObjectivesAccess to safe and nutritious food is essential for good health. However, food can become unsafe due to contamination with pathogens, chemicals or toxins, or mislabeling of allergens. Illness resulting from the consumption of unsafe foods is a global health problem. Here, we develop a machine learning approach for detecting reports of unsafe food products in consumer product reviews from Amazon.com.Materials and methodsWe linked Amazon.com food product reviews to Food and Drug Administration (FDA) food recalls from 2012 to 2014 using text matching approaches in a PostGres relational database. We applied machine learning methods and over- and under-sampling methods to the linked data to automate the detection of reports of unsafe food products.ResultsOur data consisted of 1 297 156 product reviews from Amazon.com. Only 5149 (0.4%) were linked to recalled food products. Bidirectional Encoder Representation from Transformations performed best in identifying unsafe food reviews, achieving an F1 score, precision and recall of 0.74, 0.78, and 0.71, respectively. We also identified synonyms for terms associated with FDA recalls in more than 20 000 reviews, most of which were associated with nonrecalled products. This might suggest that many more products should have been recalled or investigated.Discussion and conclusionChallenges to improving food safety include, urbanization which has led to a longer food chain, underreporting of illness and difficulty in linking contaminated food to illness. Our approach can improve food safety by enabling early identification of unsafe foods which can lead to timely recall thereby limiting the health and economic impact on the public
Recommended from our members
Detecting reports of unsafe foods in consumer product reviews
ObjectivesAccess to safe and nutritious food is essential for good health. However, food can become unsafe due to contamination with pathogens, chemicals or toxins, or mislabeling of allergens. Illness resulting from the consumption of unsafe foods is a global health problem. Here, we develop a machine learning approach for detecting reports of unsafe food products in consumer product reviews from Amazon.com.Materials and methodsWe linked Amazon.com food product reviews to Food and Drug Administration (FDA) food recalls from 2012 to 2014 using text matching approaches in a PostGres relational database. We applied machine learning methods and over- and under-sampling methods to the linked data to automate the detection of reports of unsafe food products.ResultsOur data consisted of 1 297 156 product reviews from Amazon.com. Only 5149 (0.4%) were linked to recalled food products. Bidirectional Encoder Representation from Transformations performed best in identifying unsafe food reviews, achieving an F1 score, precision and recall of 0.74, 0.78, and 0.71, respectively. We also identified synonyms for terms associated with FDA recalls in more than 20 000 reviews, most of which were associated with nonrecalled products. This might suggest that many more products should have been recalled or investigated.Discussion and conclusionChallenges to improving food safety include, urbanization which has led to a longer food chain, underreporting of illness and difficulty in linking contaminated food to illness. Our approach can improve food safety by enabling early identification of unsafe foods which can lead to timely recall thereby limiting the health and economic impact on the public
Mapping disparities in homicide trends across Brazil: 2000–2014
Abstract
Background
Homicides are a major problem in Brazil. Drugs and arms trafficking, and land conflicts are three of the many factors driving homicide rates in Brazil. Understanding long-term spatiotemporal trends and social structural factors associated with homicides in Brazil would be useful for designing policies aimed at reducing homicide rates.
Methods
We obtained data from 2000 to 2014 from the Brazil Ministry of Health (MOH) Mortality Information System and sociodemographic data from the Brazil Institute of Geography and Statistics (IBGE). First, we quantified the rate of change in homicides at the municipality and state levels. Second, we used principal component regression and k-medoids clustering to examine differences in temporal trends across municipalities. Lastly, we used Bayesian hierarchical space-time models to describe spatio-temporal patterns and to assess the contribution of structural factors.
Results
There were significant variations in homicide rates across states and municipalities. We noted the largest decrease in homicide rates in the western and southeastern states of Sao Paulo, Rio de Janeiro and Espirito Santo, which coincided with an increase in homicide rates in the northeastern states of Ceará, Alagoas, Paraiba, Rio Grande Norte, Sergipe and Bahia during the fifteen-year period. The decrease in homicides in municipalities with populations of at least 250,000 coincided with an increase in municipalities with 25,000 people or less. Structural factors that predicted municipality-level homicide rates included crude domestic product, urbanization, border with neighboring countries and proportion of population aged fifteen to twenty-nine.
Conclusions
Our findings support both a dissemination hypothesis and an interiorization hypothesis. These findings should be considered when designing interventions to curb homicide rates.http://deepblue.lib.umich.edu/bitstream/2027.42/174011/1/40621_2020_Article_273.pd