215,872 research outputs found
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
Contrastive learning based vision-language joint pre-training has emerged as
a successful representation learning strategy. In this paper, we present a
prototype representation learning framework incorporating both global and local
alignment between medical images and reports. In contrast to standard global
multi-modality alignment methods, we employ a local alignment module for
fine-grained representation. Furthermore, a cross-modality conditional
reconstruction module is designed to interchange information across modalities
in the training phase by reconstructing masked images and reports. For
reconstructing long reports, a sentence-wise prototype memory bank is
constructed, enabling the network to focus on low-level localized visual and
high-level clinical linguistic features. Additionally, a non-auto-regressive
generation paradigm is proposed for reconstructing non-sequential reports.
Experimental results on five downstream tasks, including supervised
classification, zero-shot classification, image-to-text retrieval, semantic
segmentation, and object detection, show the proposed method outperforms other
state-of-the-art methods across multiple datasets and under different dataset
size settings. The code is available at https://github.com/QtacierP/PRIOR.Comment: Accepted by ICCV 202
Non-convex image reconstruction via Expectation Propagation
Tomographic image reconstruction can be mapped to a problem of finding
solutions to a large system of linear equations which maximize a function that
includes \textit{a priori} knowledge regarding features of typical images such
as smoothness or sharpness. This maximization can be performed with standard
local optimization tools when the function is concave, but it is generally
intractable for realistic priors, which are non-concave. We introduce a new
method to reconstruct images obtained from Radon projections by using
Expectation Propagation, which allows us to reframe the problem from an
Bayesian inference perspective. We show, by means of extensive simulations,
that, compared to state-of-the-art algorithms for this task, Expectation
Propagation paired with very simple but non log-concave priors, is often able
to reconstruct images up to a smaller error while using a lower amount of
information per pixel. We provide estimates for the critical rate of
information per pixel above which recovery is error-free by means of
simulations on ensembles of phantom and real images.Comment: 12 pages, 6 figure
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
- …