Search CORE

107,204 research outputs found

Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment

Author: Chen Zhihao
Cheng Lionel
Fu Huazhu
Liu Yong
Ooi Gideon
Thng Choon Hua
Tran Anh
Wan Liang
Xu Xinxing
Zhao Junting
Zhou Yang
Publication venue
Publication date: 13/03/2023
Field of study

Medical phrase grounding (MPG) aims to locate the most relevant region in a medical image, given a phrase query describing certain medical findings, which is an important task for medical image analysis and radiological diagnosis. However, existing visual grounding methods rely on general visual features for identifying objects in natural images and are not capable of capturing the subtle and specialized features of medical findings, leading to sub-optimal performance in MPG. In this paper, we propose MedRPG, an end-to-end approach for MPG. MedRPG is built on a lightweight vision-language transformer encoder and directly predicts the box coordinates of mentioned medical findings, which can be trained with limited medical data, making it a valuable tool in medical image analysis. To enable MedRPG to locate nuanced medical findings with better region-phrase correspondences, we further propose Tri-attention Context contrastive alignment (TaCo). TaCo seeks context alignment to pull both the features and attention outputs of relevant region-phrase pairs close together while pushing those of irrelevant regions far away. This ensures that the final box prediction depends more on its finding-specific regions and phrases. Experimental results on three MPG datasets demonstrate that our MedRPG outperforms state-of-the-art visual grounding approaches by a large margin. Additionally, the proposed TaCo strategy is effective in enhancing finding localization ability and reducing spurious region-phrase correlations

arXiv.org e-Print Archive

Target-Tailored Source-Transformation for Scene Graph Generation

Author: Lan Cuiling
Liao Wentong
Rosenhahn Bodo
Yang Michael Ying
Zeng Wenjun
Publication venue
Publication date: 27/05/2020
Field of study

Scene graph generation aims to provide a semantic and structural description of an image, denoting the objects (with nodes) and their relationships (with edges). The best performing works to date are based on exploiting the context surrounding objects or relations,e.g., by passing information among objects. In these approaches, to transform the representation of source objects is a critical process for extracting information for the use by target objects. In this work, we argue that a source object should give what tar-get object needs and give different objects different information rather than contributing common information to all targets. To achieve this goal, we propose a Target-TailoredSource-Transformation (TTST) method to efficiently propagate information among object proposals and relations. Particularly, for a source object proposal which will contribute information to other target objects, we transform the source object feature to the target object feature domain by simultaneously taking both the source and target into account. We further explore more powerful representations by integrating language prior with the visual context in the transformation for the scene graph generation. By doing so the target object is able to extract target-specific information from the source object and source relation accordingly to refine its representation. Our framework is validated on the Visual Genome bench-mark and demonstrated its state-of-the-art performance for the scene graph generation. The experimental results show that the performance of object detection and visual relation-ship detection are promoted mutually by our method

arXiv.org e-Print Archive

University of Twente Research Information

Recommended from our members

Listening comprehension and strategy use: a longitudinal exploration

Author: Graham S.
Santos Denise
Vanderplank R.
Publication venue: 'Elsevier BV'
Publication date: 01/03/2008
Field of study

This paper examines the development of strategy use over 6 months in two lower-intermediate learners of L2 French in secondary schools in England. These learners were selected from a larger sample on the basis of their scores on a recall protocol completed after listening to short passages at two time points: one was consistently a high scorer; the other one, a low scorer. Qualitative data on these two learners’ strategic behaviour were gathered at the two time points from verbal reports made by learners while they were completing a multiple-choice listening task. Our results show a high degree of stability of strategy use over the time period, with pre-existing differences between the high and low scorer persisting. The theoretical and pedagogical implications of these findings are discussed

Central Archive at the University of Reading

Warwick Research Archives Portal Repository

Understanding Chat Messages for Sticker Recommendation in Messaging Apps

Author: Hanoosh Mohamed
Laddha Abhishek
Mukherjee Debdoot
Narang Ankur
Patwa Parth
Publication venue
Publication date: 24/11/2019
Field of study

Stickers are popularly used in messaging apps such as Hike to visually express a nuanced range of thoughts and utterances to convey exaggerated emotions. However, discovering the right sticker from a large and ever expanding pool of stickers while chatting can be cumbersome. In this paper, we describe a system for recommending stickers in real time as the user is typing based on the context of the conversation. We decompose the sticker recommendation (SR) problem into two steps. First, we predict the message that the user is likely to send in the chat. Second, we substitute the predicted message with an appropriate sticker. Majority of Hike's messages are in the form of text which is transliterated from users' native language to the Roman script. This leads to numerous orthographic variations of the same message and makes accurate message prediction challenging. To address this issue, we learn dense representations of chat messages employing character level convolution network in an unsupervised manner. We use them to cluster the messages that have the same meaning. In the subsequent steps, we predict the message cluster instead of the message. Our approach does not depend on human labelled data (except for validation), leading to fully automatic updation and tuning pipeline for the underlying models. We also propose a novel hybrid message prediction model, which can run with low latency on low-end phones that have severe computational limitations. Our described system has been deployed for more than

6

months and is being used by millions of users along with hundreds of thousands of expressive stickers

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search

Author: Lin Jimmy
Rao Jinfeng
Ture Ferhan
Yang Wei
Zhang Yuhao
Publication venue
Publication date: 21/06/2019
Field of study

Despite substantial interest in applications of neural networks to information retrieval, neural ranking models have only been applied to standard ad hoc retrieval tasks over web pages and newswire documents. This paper proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network) a novel neural ranking model specifically designed for ranking short social media posts. We identify document length, informal language, and heterogeneous relevance signals as features that distinguish documents in our domain, and present a model specifically designed with these characteristics in mind. Our model uses hierarchical convolutional layers to learn latent semantic soft-match relevance signals at the character, word, and phrase levels. A pooling-based similarity measurement layer integrates evidence from multiple types of matches between the query, the social media post, as well as URLs contained in the post. Extensive experiments using Twitter data from the TREC Microblog Tracks 2011--2014 show that our model significantly outperforms prior feature-based as well and existing neural ranking models. To our best knowledge, this paper presents the first substantial work tackling search over social media posts using neural ranking models.Comment: AAAI 2019, 10 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Coleridge: A computer tool for assisting musical reflection and self‐explanation

Author: Cook John
Morgan Nigel
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1998
Field of study

This paper examines some of the problems involved when learning how to compose music. A prototype computer‐based music tool called Coleridge is described. Coleridge was used in a study that investigated the dialogues that took place when a mentor attempted to encourage creative reflection in students. Results of dialogue analysis suggested that because learners seem unable to make accurate predictions about how a musical phrase will sound, there is a real need for a computer‐based learning assistant. Finally, the paper reports on how these findings were used to motivate the design of a mentor's assistant in a new version of Coleridge

Crossref

ALT Open Access Repository

Directory of Open Access Journals