Search CORE

215,872 research outputs found

PRIOR: Prototype Representation Joint Learning from Medical Images and Reports

Author: Cheng Pujin
Huang Yijin
Lin Li
Luo Wenhan
Lyu Junyan
Tang Xiaoying
Publication venue
Publication date: 24/07/2023
Field of study

Contrastive learning based vision-language joint pre-training has emerged as a successful representation learning strategy. In this paper, we present a prototype representation learning framework incorporating both global and local alignment between medical images and reports. In contrast to standard global multi-modality alignment methods, we employ a local alignment module for fine-grained representation. Furthermore, a cross-modality conditional reconstruction module is designed to interchange information across modalities in the training phase by reconstructing masked images and reports. For reconstructing long reports, a sentence-wise prototype memory bank is constructed, enabling the network to focus on low-level localized visual and high-level clinical linguistic features. Additionally, a non-auto-regressive generation paradigm is proposed for reconstructing non-sequential reports. Experimental results on five downstream tasks, including supervised classification, zero-shot classification, image-to-text retrieval, semantic segmentation, and object detection, show the proposed method outperforms other state-of-the-art methods across multiple datasets and under different dataset size settings. The code is available at https://github.com/QtacierP/PRIOR.Comment: Accepted by ICCV 202

arXiv.org e-Print Archive

Non-convex image reconstruction via Expectation Propagation

Author: Braunstein Alfredo
Castillo Isaac Pérez
Muntoni Anna Paola
Pagnani Andrea
Rojas Rafael Díaz Hernández
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2018
Field of study

Tomographic image reconstruction can be mapped to a problem of finding solutions to a large system of linear equations which maximize a function that includes \textit{a priori} knowledge regarding features of typical images such as smoothness or sharpness. This maximization can be performed with standard local optimization tools when the function is concave, but it is generally intractable for realistic priors, which are non-concave. We introduce a new method to reconstruct images obtained from Radon projections by using Expectation Propagation, which allows us to reframe the problem from an Bayesian inference perspective. We show, by means of extensive simulations, that, compared to state-of-the-art algorithms for this task, Expectation Propagation paired with very simple but non log-concave priors, is often able to reconstruct images up to a smaller error while using a lower amount of information per pixel. We provide estimates for the critical rate of information per pixel above which recovery is error-free by means of simulations on ensembles of phantom and real images.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Hal-Diderot

Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

Author: Shao Ling
Wang Yang
Wu Lin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2018
Field of study

In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other author

arXiv.org e-Print Archive

University of Queensland eSpace