215,872 research outputs found

    PRIOR: Prototype Representation Joint Learning from Medical Images and Reports

    Full text link
    Contrastive learning based vision-language joint pre-training has emerged as a successful representation learning strategy. In this paper, we present a prototype representation learning framework incorporating both global and local alignment between medical images and reports. In contrast to standard global multi-modality alignment methods, we employ a local alignment module for fine-grained representation. Furthermore, a cross-modality conditional reconstruction module is designed to interchange information across modalities in the training phase by reconstructing masked images and reports. For reconstructing long reports, a sentence-wise prototype memory bank is constructed, enabling the network to focus on low-level localized visual and high-level clinical linguistic features. Additionally, a non-auto-regressive generation paradigm is proposed for reconstructing non-sequential reports. Experimental results on five downstream tasks, including supervised classification, zero-shot classification, image-to-text retrieval, semantic segmentation, and object detection, show the proposed method outperforms other state-of-the-art methods across multiple datasets and under different dataset size settings. The code is available at https://github.com/QtacierP/PRIOR.Comment: Accepted by ICCV 202

    Non-convex image reconstruction via Expectation Propagation

    Get PDF
    Tomographic image reconstruction can be mapped to a problem of finding solutions to a large system of linear equations which maximize a function that includes \textit{a priori} knowledge regarding features of typical images such as smoothness or sharpness. This maximization can be performed with standard local optimization tools when the function is concave, but it is generally intractable for realistic priors, which are non-concave. We introduce a new method to reconstruct images obtained from Radon projections by using Expectation Propagation, which allows us to reframe the problem from an Bayesian inference perspective. We show, by means of extensive simulations, that, compared to state-of-the-art algorithms for this task, Expectation Propagation paired with very simple but non log-concave priors, is often able to reconstruct images up to a smaller error while using a lower amount of information per pixel. We provide estimates for the critical rate of information per pixel above which recovery is error-free by means of simulations on ensembles of phantom and real images.Comment: 12 pages, 6 figure

    Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

    Full text link
    In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other author
    corecore