207 research outputs found
Joint chest X-ray diagnosis and clinical visual attention prediction with multi-stage cooperative learning: enhancing interpretability
As deep learning has become the state-of-the-art for computer-assisted
diagnosis, interpretability of the automatic decisions is crucial for clinical
deployment. While various methods were proposed in this domain, visual
attention maps of clinicians during radiological screening offer a unique asset
to provide important insights and can potentially enhance the quality of
computer-assisted diagnosis. With this paper, we introduce a novel
deep-learning framework for joint disease diagnosis and prediction of
corresponding visual saliency maps for chest X-ray scans. Specifically, we
designed a novel dual-encoder multi-task UNet, which leverages both a
DenseNet201 backbone and a Residual and Squeeze-and-Excitation block-based
encoder to extract diverse features for saliency map prediction, and a
multi-scale feature-fusion classifier to perform disease classification. To
tackle the issue of asynchronous training schedules of individual tasks in
multi-task learning, we proposed a multi-stage cooperative learning strategy,
with contrastive learning for feature encoder pretraining to boost performance.
Experiments show that our proposed method outperformed existing techniques for
chest X-ray diagnosis and the quality of visual saliency map prediction
Revisiting the Knowledge Injection Frameworks
In recent years, large language models (LLMs), such as GPTs, have attained
great impact worldwide. However, how to adapt these LLMs to better suit the
vertical domain-specific tasks by utilizing external knowledge remains not
completely solved. Indeed, there have emerged a few works on this line where
most of them rely on an alignment heuristic that is built to inject the
corresponding knowledge tuple into the associated text sample.
However, despite the promise, we identify a pivotal problem in this work
ubiquitously. Simply put, we find that injecting unaligned (i.e., random)
knowledge tuple into the LLMs achieves comparable (and sometimes better)
results than the aligned knowledge being injected. We therefore take a thorough
investigation of this frustrating finding on a variety of related prior work
and further provide a chain of potential interpretations for the phenomenon.
Based on all that, we offer a simple remediated technique. Briefly, the core of
this technique is rooted in an ideological emphasis on the pruning and
purification of the external knowledge base to be injected into LLMs. At last,
we show that by integrating this technique into most (if not all) knowledge
injection frameworks and recent LLMs, it manages to overcome the aforementioned
sanity problem and further pushes the boundary of the performance of the
domain-adaptive LLMs.Comment: 9 pages, 6 figures, accepted by EMNLP 2023 Mai
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification
Retrieval augmentation, which enhances downstream models by a knowledge
retriever and an external corpus instead of by merely increasing the number of
model parameters, has been successfully applied to many natural language
processing (NLP) tasks such as text classification, question answering and so
on. However, existing methods that separately or asynchronously train the
retriever and downstream model mainly due to the non-differentiability between
the two parts, usually lead to degraded performance compared to end-to-end
joint training. In this paper, we propose Differentiable Retrieval Augmentation
via Generative lANguage modeling(Dragan), to address this problem by a novel
differentiable reformulation. We demonstrate the effectiveness of our proposed
method on a challenging NLP task in e-commerce search, namely query intent
classification. Both the experimental results and ablation study show that the
proposed method significantly and reasonably improves the state-of-the-art
baselines on both offline evaluation and online A/B test.Comment: 5 pages, 2 figures; accepted by CIKM202
A Multi-Granularity Matching Attention Network for Query Intent Classification in E-commerce Retrieval
Query intent classification, which aims at assisting customers to find
desired products, has become an essential component of the e-commerce search.
Existing query intent classification models either design more exquisite models
to enhance the representation learning of queries or explore label-graph and
multi-task to facilitate models to learn external information. However, these
models cannot capture multi-granularity matching features from queries and
categories, which makes them hard to mitigate the gap in the expression between
informal queries and categories.
This paper proposes a Multi-granularity Matching Attention Network (MMAN),
which contains three modules: a self-matching module, a char-level matching
module, and a semantic-level matching module to comprehensively extract
features from the query and a query-category interaction matrix. In this way,
the model can eliminate the difference in expression between queries and
categories for query intent classification. We conduct extensive offline and
online A/B experiments, and the results show that the MMAN significantly
outperforms the strong baselines, which shows the superiority and effectiveness
of MMAN. MMAN has been deployed in production and brings great commercial value
for our company.Comment: Accepted by WWW 202
Incommensurate itinerant antiferromagnetic excitations and spin resonance in the FeTeSe superconductor
We report on inelastic neutron scattering measurements that find
incommensurate itinerant like magnetic excitations in the normal state of
superconducting FeTeSe (\Tc=14K) at wave-vector
with =0.09(1). In
the superconducting state only the lower energy part of the spectrum shows
significant changes by the formation of a gap and a magnetic resonance that
follows the dispersion of the normal state excitations. We use a four band
model to describe the Fermi surface topology of iron-based superconductors with
the extended symmetry and find that it qualitatively captures the
salient features of these data.Comment: 7 pages and 5 figure
CRAVES: Controlling Robotic Arm with a Vision-based Economic System
Training a robotic arm to accomplish real-world tasks has been attracting
increasing attention in both academia and industry. This work discusses the
role of computer vision algorithms in this field. We focus on low-cost arms on
which no sensors are equipped and thus all decisions are made upon visual
recognition, e.g., real-time 3D pose estimation. This requires annotating a lot
of training data, which is not only time-consuming but also laborious.
In this paper, we present an alternative solution, which uses a 3D model to
create a large number of synthetic data, trains a vision model in this virtual
domain, and applies it to real-world images after domain adaptation. To this
end, we design a semi-supervised approach, which fully leverages the geometric
constraints among keypoints. We apply an iterative algorithm for optimization.
Without any annotations on real images, our algorithm generalizes well and
produces satisfying results on 3D pose estimation, which is evaluated on two
real-world datasets. We also construct a vision-based control system for task
accomplishment, for which we train a reinforcement learning agent in a virtual
environment and apply it to the real-world. Moreover, our approach, with merely
a 3D model being required, has the potential to generalize to other types of
multi-rigid-body dynamic systems.Comment: 10 pages, 6 figure
- …