82 research outputs found
Interpretable Subgroup Discovery in Treatment Effect Estimation with Application to Opioid Prescribing Guidelines
The dearth of prescribing guidelines for physicians is one key driver of the
current opioid epidemic in the United States. In this work, we analyze medical
and pharmaceutical claims data to draw insights on characteristics of patients
who are more prone to adverse outcomes after an initial synthetic opioid
prescription. Toward this end, we propose a generative model that allows
discovery from observational data of subgroups that demonstrate an enhanced or
diminished causal effect due to treatment. Our approach models these
sub-populations as a mixture distribution, using sparsity to enhance
interpretability, while jointly learning nonlinear predictors of the potential
outcomes to better adjust for confounding. The approach leads to
human-interpretable insights on discovered subgroups, improving the practical
utility for decision suppor
Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering
Visual Question Answering (VQA) has emerged as one of the most challenging
tasks in artificial intelligence due to its multi-modal nature. However, most
existing VQA methods are incapable of handling Knowledge-based Visual Question
Answering (KB-VQA), which requires external knowledge beyond visible contents
to answer questions about a given image. To address this issue, we propose a
novel framework that endows the model with capabilities of answering more
general questions, and achieves a better exploitation of external knowledge
through generating Multiple Clues for Reasoning with Memory Neural Networks
(MCR-MemNN). Specifically, a well-defined detector is adopted to predict
image-question related relation phrases, each of which delivers two
complementary clues to retrieve the supporting facts from external knowledge
base (KB), which are further encoded into a continuous embedding space using a
content-addressable memory. Afterwards, mutual interactions between
visual-semantic representation and the supporting facts stored in memory are
captured to distill the most relevant information in three modalities (i.e.,
image, question, and KB). Finally, the optimal answer is predicted by choosing
the supporting fact with the highest score. We conduct extensive experiments on
two widely-used benchmarks. The experimental results well justify the
effectiveness of MCR-MemNN, as well as its superiority over other KB-VQA
methods
Cross-Modal Reasoning with Event Correlation for Video Question Answering
Video Question Answering (VideoQA) is a very attractive and challenging
research direction aiming to understand complex semantics of heterogeneous data
from two domains, i.e., the spatio-temporal video content and the word sequence
in question. Although various attention mechanisms have been utilized to manage
contextualized representations by modeling intra- and inter-modal relationships
of the two modalities, one limitation of the predominant VideoQA methods is the
lack of reasoning with event correlation, that is, sensing and analyzing
relationships among abundant and informative events contained in the video. In
this paper, we introduce the dense caption modality as a new auxiliary and
distill event-correlated information from it to infer the correct answer. To
this end, we propose a novel end-to-end trainable model, Event-Correlated Graph
Neural Networks (EC-GNNs), to perform cross-modal reasoning over information
from the three modalities (i.e., caption, video, and question). Besides the
exploitation of a brand new modality, we employ cross-modal reasoning modules
for explicitly modeling inter-modal relationships and aggregating relevant
information across different modalities, and we propose a question-guided
self-adaptive multi-modal fusion module to collect the question-oriented and
event-correlated evidence through multi-step reasoning. We evaluate our model
on two widely-used benchmark datasets and conduct an ablation study to justify
the effectiveness of each proposed component
Human Pose Transfer with Augmented Disentangled Feature Consistency
Deep generative models have made great progress in synthesizing images with
arbitrary human poses and transferring poses of one person to others. Though
many different methods have been proposed to generate images with high visual
fidelity, the main challenge remains and comes from two fundamental issues:
pose ambiguity and appearance inconsistency. To alleviate the current
limitations and improve the quality of the synthesized images, we propose a
pose transfer network with augmented Disentangled Feature Consistency (DFC-Net)
to facilitate human pose transfer. Given a pair of images containing the source
and target person, DFC-Net extracts pose and static information from the source
and target respectively, then synthesizes an image of the target person with
the desired pose from the source. Moreover, DFC-Net leverages disentangled
feature consistency losses in the adversarial training to strengthen the
transfer coherence and integrates a keypoint amplifier to enhance the pose
feature extraction. With the help of the disentangled feature consistency
losses, we further propose a novel data augmentation scheme that introduces
unpaired support data with the augmented consistency constraints to improve the
generality and robustness of DFC-Net. Extensive experimental results on
Mixamo-Pose and EDN-10k have demonstrated DFC-Net achieves state-of-the-art
performance on pose transfer.Comment: 22 pages, 6 figure
- …
