175 research outputs found
Prototype as Query for Few Shot Semantic Segmentation
Few-shot Semantic Segmentation (FSS) was proposed to segment unseen classes
in a query image, referring to only a few annotated examples named support
images. One of the characteristics of FSS is spatial inconsistency between
query and support targets, e.g., texture or appearance. This greatly challenges
the generalization ability of methods for FSS, which requires to effectively
exploit the dependency of the query image and the support examples. Most
existing methods abstracted support features into prototype vectors and
implemented the interaction with query features using cosine similarity or
feature concatenation. However, this simple interaction may not capture spatial
details in query features. To alleviate this limitation, a few methods utilized
all pixel-wise support information via computing the pixel-wise correlations
between paired query and support features implemented with the attention
mechanism of Transformer. These approaches suffer from heavy computation on the
dot-product attention between all pixels of support and query features. In this
paper, we propose a simple yet effective framework built upon Transformer
termed as ProtoFormer to fully capture spatial details in query features. It
views the abstracted prototype of the target class in support features as Query
and the query features as Key and Value embeddings, which are input to the
Transformer decoder. In this way, the spatial details can be better captured
and the semantic features of target class in the query image can be focused.
The output of the Transformer-based module can be viewed as semantic-aware
dynamic kernels to filter out the segmentation mask from the enriched query
features. Extensive experiments on PASCAL- and COCO- show that
our ProtoFormer significantly advances the state-of-the-art methods.Comment: under revie
Multidimensional Uncertainty-Aware Evidential Neural Networks
Traditional deep neural networks (NNs) have significantly contributed to the
state-of-the-art performance in the task of classification under various
application domains. However, NNs have not considered inherent uncertainty in
data associated with the class probabilities where misclassification under
uncertainty may easily introduce high risk in decision making in real-world
contexts (e.g., misclassification of objects in roads leads to serious
accidents). Unlike Bayesian NN that indirectly infer uncertainty through weight
uncertainties, evidential NNs (ENNs) have been recently proposed to explicitly
model the uncertainty of class probabilities and use them for classification
tasks. An ENN offers the formulation of the predictions of NNs as subjective
opinions and learns the function by collecting an amount of evidence that can
form the subjective opinions by a deterministic NN from data. However, the ENN
is trained as a black box without explicitly considering inherent uncertainty
in data with their different root causes, such as vacuity (i.e., uncertainty
due to a lack of evidence) or dissonance (i.e., uncertainty due to conflicting
evidence). By considering the multidimensional uncertainty, we proposed a novel
uncertainty-aware evidential NN called WGAN-ENN (WENN) for solving an
out-of-distribution (OOD) detection problem. We took a hybrid approach that
combines Wasserstein Generative Adversarial Network (WGAN) with ENNs to jointly
train a model with prior knowledge of a certain class, which has high vacuity
for OOD samples. Via extensive empirical experiments based on both synthetic
and real-world datasets, we demonstrated that the estimation of uncertainty by
WENN can significantly help distinguish OOD samples from boundary samples. WENN
outperformed in OOD detection when compared with other competitive
counterparts.Comment: AAAI 202
Better to Ask in English: Cross-Lingual Evaluation of Large Language Models for Healthcare Queries
Large language models (LLMs) are transforming the ways the general public
accesses and consumes information. Their influence is particularly pronounced
in pivotal sectors like healthcare, where lay individuals are increasingly
appropriating LLMs as conversational agents for everyday queries. While LLMs
demonstrate impressive language understanding and generation proficiencies,
concerns regarding their safety remain paramount in these high-stake domains.
Moreover, the development of LLMs is disproportionately focused on English. It
remains unclear how these LLMs perform in the context of non-English languages,
a gap that is critical for ensuring equity in the real-world use of these
systems.This paper provides a framework to investigate the effectiveness of
LLMs as multi-lingual dialogue systems for healthcare queries. Our
empirically-derived framework XlingEval focuses on three fundamental criteria
for evaluating LLM responses to naturalistic human-authored health-related
questions: correctness, consistency, and verifiability. Through extensive
experiments on four major global languages, including English, Spanish,
Chinese, and Hindi, spanning three expert-annotated large health Q&A datasets,
and through an amalgamation of algorithmic and human-evaluation strategies, we
found a pronounced disparity in LLM responses across these languages,
indicating a need for enhanced cross-lingual capabilities. We further propose
XlingHealth, a cross-lingual benchmark for examining the multilingual
capabilities of LLMs in the healthcare context. Our findings underscore the
pressing need to bolster the cross-lingual capacities of these models, and to
provide an equitable information ecosystem accessible to all.Comment: 18 pages, 7 figure
- …