855 research outputs found
Transformer Based Multi-Source Domain Adaptation
In practical machine learning settings, the data on which a model must make
predictions often come from a different distribution than the data it was
trained on. Here, we investigate the problem of unsupervised multi-source
domain adaptation, where a model is trained on labelled data from multiple
source domains and must make predictions on a domain for which no labelled data
has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of
mixture of experts, where the predictions of multiple domain expert classifiers
are combined; as well as domain adversarial training, to induce a domain
agnostic representation space. Inspired by this, we investigate how such
methods can be effectively applied to large pretrained transformer models. We
find that domain adversarial training has an effect on the learned
representations of these models while having little effect on their
performance, suggesting that large transformer-based models are already
relatively robust across domains. Additionally, we show that mixture of experts
leads to significant performance improvements by comparing several variants of
mixing functions, including one novel mixture based on attention. Finally, we
demonstrate that the predictions of large pretrained transformer based domain
experts are highly homogenous, making it challenging to learn effective
functions for mixing their predictions.Comment: 12 pages, 3 figures, 5 table
Adverse Drug Event Detection, Causality Inference, Patient Communication and Translational Research
Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the FDA Adverse Event Reporting System reports. The ADEtector system employs novel natural language processing approaches for ADE detection and provides a user interface to display ADE information. The ADEtector employs machine learning techniques to automatically processes the narrative text and identify the adverse event (AE) and medication entities that appear in that narrative text. The system will analyze the entities recognized to infer the causal relation that exists between AEs and medications by automating the elements of Naranjo score using knowledge and rule based approaches. The Naranjo Adverse Drug Reaction Probability Scale is a validated tool for finding the causality of a drug induced adverse event or ADE. The scale calculates the likelihood of an adverse event related to drugs based on a list of weighted questions. The ADEtector also presents the user with evidence for ADEs by extracting figures that contain ADE related information from biomedical literature. A brief summary is generated for each of the figures that are extracted to help users better comprehend the figure. This will further enhance the user experience in understanding the ADE information better. The ADEtector also helps patients better understand the narrative text by recognizing complex medical jargon and abbreviations that appear in the text and providing definitions and explanations for them from external knowledge resources. This system could help clinicians and researchers in discovering novel ADEs and drug relations and also hypothesize new research questions within the ADE domain
Spanish named entity recognition in the biomedical domain
Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft
CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
Large, labeled datasets have driven deep learning methods to achieve
expert-level performance on a variety of medical imaging tasks. We present
CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240
patients. We design a labeler to automatically detect the presence of 14
observations in radiology reports, capturing uncertainties inherent in
radiograph interpretation. We investigate different approaches to using the
uncertainty labels for training convolutional neural networks that output the
probability of these observations given the available frontal and lateral
radiographs. On a validation set of 200 chest radiographic studies which were
manually annotated by 3 board-certified radiologists, we find that different
uncertainty approaches are useful for different pathologies. We then evaluate
our best model on a test set composed of 500 chest radiographic studies
annotated by a consensus of 5 board-certified radiologists, and compare the
performance of our model to that of 3 additional radiologists in the detection
of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the
model ROC and PR curves lie above all 3 radiologist operating points. We
release the dataset to the public as a standard benchmark to evaluate
performance of chest radiograph interpretation models.
The dataset is freely available at
https://stanfordmlgroup.github.io/competitions/chexpert .Comment: Published in AAAI 201
Generalizing through Forgetting -- Domain Generalization for Symptom Event Extraction in Clinical Notes
Symptom information is primarily documented in free-text clinical notes and
is not directly accessible for downstream applications. To address this
challenge, information extraction approaches that can handle clinical language
variation across different institutions and specialties are needed. In this
paper, we present domain generalization for symptom extraction using
pretraining and fine-tuning data that differs from the target domain in terms
of institution and/or specialty and patient population. We extract symptom
events using a transformer-based joint entity and relation extraction method.
To reduce reliance on domain-specific features, we propose a domain
generalization method that dynamically masks frequent symptoms words in the
source domain. Additionally, we pretrain the transformer language model (LM) on
task-related unlabeled texts for better representation. Our experiments
indicate that masking and adaptive pretraining methods can significantly
improve performance when the source domain is more distant from the target
domain
Understanding and Measuring Psychological Stress using Social Media
A body of literature has demonstrated that users' mental health conditions,
such as depression and anxiety, can be predicted from their social media
language. There is still a gap in the scientific understanding of how
psychological stress is expressed on social media. Stress is one of the primary
underlying causes and correlates of chronic physical illnesses and mental
health conditions. In this paper, we explore the language of psychological
stress with a dataset of 601 social media users, who answered the Perceived
Stress Scale questionnaire and also consented to share their Facebook and
Twitter data. Firstly, we find that stressed users post about exhaustion,
losing control, increased self-focus and physical pain as compared to posts
about breakfast, family-time, and travel by users who are not stressed.
Secondly, we find that Facebook language is more predictive of stress than
Twitter language. Thirdly, we demonstrate how the language based models thus
developed can be adapted and be scaled to measure county-level trends. Since
county-level language is easily available on Twitter using the Streaming API,
we explore multiple domain adaptation algorithms to adapt user-level Facebook
models to Twitter language. We find that domain-adapted and scaled social
media-based measurements of stress outperform sociodemographic variables (age,
gender, race, education, and income), against ground-truth survey-based stress
measurements, both at the user- and the county-level in the U.S. Twitter
language that scores higher in stress is also predictive of poorer health, less
access to facilities and lower socioeconomic status in counties. We conclude
with a discussion of the implications of using social media as a new tool for
monitoring stress levels of both individuals and counties.Comment: Accepted for publication in the proceedings of ICWSM 201
Seven properties of self-organization in the human brain
The principle of self-organization has acquired a fundamental significance in the newly emerging field of computational philosophy. Self-organizing systems have been described in various domains in science and philosophy including physics, neuroscience, biology and medicine, ecology, and sociology. While system architecture and their general purpose may depend on domain-specific concepts and definitions, there are (at least) seven key properties of self-organization clearly identified in brain systems: 1) modular connectivity, 2) unsupervised learning, 3) adaptive ability, 4) functional resiliency, 5) functional plasticity, 6) from-local-to-global functional organization, and 7) dynamic system growth. These are defined here in the light of insight from neurobiology, cognitive neuroscience and Adaptive Resonance Theory (ART), and physics to show that self-organization achieves stability and functional plasticity while minimizing structural system complexity. A specific example informed by empirical research is discussed to illustrate how modularity, adaptive learning, and dynamic network growth enable stable yet plastic somatosensory representation for human grip force control. Implications for the design of “strong” artificial intelligence in robotics are brought forward
Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019)
International audienc
Measuring Semantic Textual Similarity and Automatic Answer Assessment in Dialogue Based Tutoring Systems
This dissertation presents methods and resources proposed to improve onmeasuring semantic textual similarity and their applications in student responseunderstanding in dialogue based Intelligent Tutoring Systems. In order to predict the extent of similarity between given pair of sentences,we have proposed machine learning models using dozens of features, such as thescores calculated using optimal multi-level alignment, vector based compositionalsemantics, and machine translation evaluation methods. Furthermore, we haveproposed models towards adding an interpretation layer on top of similaritymeasurement systems. Our models on predicting and interpreting the semanticsimilarity have been the top performing systems in SemEval (a premier venue for thesemantic evaluation) for the last three years. The correlations between our models\u27predictions and the human judgments were above 0.80 for several datasets while ourmodels being very robust than many other top performing systems. Moreover, wehave proposed Bayesian. We have also proposed a novel Neural Network based word representationmapping approach which allows us to map the vector based representation of a wordfound in one model to the another model where the word representation is missing,effectively pooling together the vocabularies and corresponding representationsacross models. Our experiments show that the model coverage increased by few toseveral times depending on which model\u27s vocabulary is taken as a reference. Also,the transformed representations were well correlated to the native target modelvectors showing that the mapped representations can be used with condence tosubstitute the missing word representations in the target model. models to adapt similarity models across domains. Furthermore, we have proposed methods to improve open-ended answersassessment in dialogue based tutoring systems which is very challenging because ofthe variations in student answers which often are not self contained and need thecontextual information (e.g., dialogue history) in order to better assess theircorrectness. In that, we have proposed Probabilistic Soft Logic (PSL) modelsaugmenting semantic similarity information with other knowledge. To detect intra- and inter-sentential negation scope and focus in tutorialdialogs, we have developed Conditional Random Fields (CRF) models. The resultsindicate that our approach is very effective in detecting negation scope and focus intutorial dialogue context and can be further developed to augment the naturallanguage understanding systems. Additionally, we created resources (datasets, models, and tools) for fosteringresearch in semantic similarity and student response understanding inconversational tutoring systems
- …