107,204 research outputs found
Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment
Medical phrase grounding (MPG) aims to locate the most relevant region in a
medical image, given a phrase query describing certain medical findings, which
is an important task for medical image analysis and radiological diagnosis.
However, existing visual grounding methods rely on general visual features for
identifying objects in natural images and are not capable of capturing the
subtle and specialized features of medical findings, leading to sub-optimal
performance in MPG. In this paper, we propose MedRPG, an end-to-end approach
for MPG. MedRPG is built on a lightweight vision-language transformer encoder
and directly predicts the box coordinates of mentioned medical findings, which
can be trained with limited medical data, making it a valuable tool in medical
image analysis. To enable MedRPG to locate nuanced medical findings with better
region-phrase correspondences, we further propose Tri-attention Context
contrastive alignment (TaCo). TaCo seeks context alignment to pull both the
features and attention outputs of relevant region-phrase pairs close together
while pushing those of irrelevant regions far away. This ensures that the final
box prediction depends more on its finding-specific regions and phrases.
Experimental results on three MPG datasets demonstrate that our MedRPG
outperforms state-of-the-art visual grounding approaches by a large margin.
Additionally, the proposed TaCo strategy is effective in enhancing finding
localization ability and reducing spurious region-phrase correlations
Target-Tailored Source-Transformation for Scene Graph Generation
Scene graph generation aims to provide a semantic and structural description
of an image, denoting the objects (with nodes) and their relationships (with
edges). The best performing works to date are based on exploiting the context
surrounding objects or relations,e.g., by passing information among objects. In
these approaches, to transform the representation of source objects is a
critical process for extracting information for the use by target objects. In
this work, we argue that a source object should give what tar-get object needs
and give different objects different information rather than contributing
common information to all targets. To achieve this goal, we propose a
Target-TailoredSource-Transformation (TTST) method to efficiently propagate
information among object proposals and relations. Particularly, for a source
object proposal which will contribute information to other target objects, we
transform the source object feature to the target object feature domain by
simultaneously taking both the source and target into account. We further
explore more powerful representations by integrating language prior with the
visual context in the transformation for the scene graph generation. By doing
so the target object is able to extract target-specific information from the
source object and source relation accordingly to refine its representation. Our
framework is validated on the Visual Genome bench-mark and demonstrated its
state-of-the-art performance for the scene graph generation. The experimental
results show that the performance of object detection and visual relation-ship
detection are promoted mutually by our method
Recommended from our members
Listening comprehension and strategy use: a longitudinal exploration
This paper examines the development of strategy use over 6 months in two lower-intermediate learners of L2 French in secondary schools in England. These learners were selected from a larger sample on the basis of their scores on a recall protocol completed after listening to short passages at two time points: one was consistently a high scorer; the other one, a low scorer. Qualitative data on these two learnersâ strategic behaviour were gathered at the two time points from verbal reports made by learners while they were completing a multiple-choice listening task. Our results show a high degree of stability of strategy use over the time period, with pre-existing differences between the high and low scorer persisting. The theoretical and pedagogical implications of these findings are discussed
Understanding Chat Messages for Sticker Recommendation in Messaging Apps
Stickers are popularly used in messaging apps such as Hike to visually
express a nuanced range of thoughts and utterances to convey exaggerated
emotions. However, discovering the right sticker from a large and ever
expanding pool of stickers while chatting can be cumbersome. In this paper, we
describe a system for recommending stickers in real time as the user is typing
based on the context of the conversation. We decompose the sticker
recommendation (SR) problem into two steps. First, we predict the message that
the user is likely to send in the chat. Second, we substitute the predicted
message with an appropriate sticker. Majority of Hike's messages are in the
form of text which is transliterated from users' native language to the Roman
script. This leads to numerous orthographic variations of the same message and
makes accurate message prediction challenging. To address this issue, we learn
dense representations of chat messages employing character level convolution
network in an unsupervised manner. We use them to cluster the messages that
have the same meaning. In the subsequent steps, we predict the message cluster
instead of the message. Our approach does not depend on human labelled data
(except for validation), leading to fully automatic updation and tuning
pipeline for the underlying models. We also propose a novel hybrid message
prediction model, which can run with low latency on low-end phones that have
severe computational limitations. Our described system has been deployed for
more than months and is being used by millions of users along with hundreds
of thousands of expressive stickers
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
Coleridge: A computer tool for assisting musical reflection and selfâexplanation
This paper examines some of the problems involved when learning how to compose music. A prototype computerâbased music tool called Coleridge is described. Coleridge was used in a study that investigated the dialogues that took place when a mentor attempted to encourage creative reflection in students. Results of dialogue analysis suggested that because learners seem unable to make accurate predictions about how a musical phrase will sound, there is a real need for a computerâbased learning assistant. Finally, the paper reports on how these findings were used to motivate the design of a mentor's assistant in a new version of Coleridge
- âŠ