73,188 research outputs found

    Hyp-UML: Hyperbolic Image Retrieval with Uncertainty-aware Metric Learning

    Full text link
    Metric learning plays a critical role in training image retrieval and classification. It is also a key algorithm in representation learning, e.g., for feature learning and its alignment in metric space. Hyperbolic embedding has been recently developed. Compared to the conventional Euclidean embedding in most of the previously developed models, Hyperbolic embedding can be more effective in representing the hierarchical data structure. Second, uncertainty estimation/measurement is a long-lasting challenge in artificial intelligence. Successful uncertainty estimation can improve a machine learning model's performance, robustness, and security. In Hyperbolic space, uncertainty measurement is at least with equivalent, if not more, critical importance. In this paper, we develop a Hyperbolic image embedding with uncertainty-aware metric learning for image retrieval. We call our method Hyp-UML: Hyperbolic Uncertainty-aware Metric Learning. Our contribution are threefold: we propose an image embedding algorithm based on Hyperbolic space, with their corresponding uncertainty value; we propose two types of uncertainty-aware metric learning, for the popular Contrastive learning and conventional margin-based metric learning, respectively. We perform extensive experimental validations to prove that the proposed algorithm can achieve state-of-the-art results among related methods. The comprehensive ablation study validates the effectiveness of each component of the proposed algorithm

    Compare More Nuanced:Pairwise Alignment Bilinear Network For Few-shot Fine-grained Learning

    Full text link
    The recognition ability of human beings is developed in a progressive way. Usually, children learn to discriminate various objects from coarse to fine-grained with limited supervision. Inspired by this learning process, we propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG) recognition, which tries to tackle the challenging fine-grained recognition task using meta-learning. The proposed method, named Pairwise Alignment Bilinear Network (PABN), is an end-to-end deep neural network. Unlike traditional deep bilinear networks for fine-grained classification, which adopt the self-bilinear pooling to capture the subtle features of images, the proposed model uses a novel pairwise bilinear pooling to compare the nuanced differences between base images and query images for learning a deep distance metric. In order to match base image features with query image features, we design feature alignment losses before the proposed pairwise bilinear pooling. Experiment results on four fine-grained classification datasets and one generic few-shot dataset demonstrate that the proposed model outperforms both the state-ofthe-art few-shot fine-grained and general few-shot methods.Comment: ICME 2019 Ora

    Attention Correctness in Neural Image Captioning

    Full text link
    Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been assessed qualitatively by visualization of several examples. In this paper we focus on evaluating and improving the correctness of attention in neural image captioning models. Specifically, we propose a quantitative evaluation metric for the consistency between the generated attention maps and human annotations, using recently released datasets with alignment between regions in images and entities in captions. We then propose novel models with different levels of explicit supervision for learning attention maps during training. The supervision can be strong when alignment between regions and caption entities are available, or weak when only object segments and categories are provided. We show on the popular Flickr30k and COCO datasets that introducing supervision of attention maps during training solidly improves both attention correctness and caption quality, showing the promise of making machine perception more human-like.Comment: To appear in AAAI-17. See http://www.cs.jhu.edu/~cxliu/ for supplementary materia

    Enhancing precision radiotherapy: image registration with deep learning and image fusion for treatment planning

    Get PDF
    Artificial intelligence is advancing in everyday life and supports its user by generating fast results in areas like communication or image recognition. This thesis aims at exploiting the abilities of deep-learning techniques for deformable image registration (DIR) to improve image alignment in medicine. An unsupervised registration and fusion workflow is developed and evaluated for 39 head scans, produced with computed tomography (CT) and magnetic resonance imaging (MRI). The three-part workflow starts by preprocessing the scans to unify the image formats and to perform affine transformation and rigid registration. Then, a deep-learning model trained for DIR is applied to these images. To obtain an appropriate configuration of the model, parameter tuning is required. The evaluation with the mutual-information metric indicates an improvement in image alignment of up to 14 % when using deep-learning-based DIR. Lastly, image fusion combines the registered CT and MRI scans with a wavelet-based method to merge the information of decomposed images. The workflow is designed for unimodal, e.g. T1- and T2-weighted MRI scans, and multimodal, e.g. CT and MRI scans, image pairs. Since medical imaging is an important basis of treatment-planning processes, the registered and fused images obtained from this workflow are expected to enhance precision radiotherapy

    ImageCaptioner2^2: Image Captioner for Image Captioning Bias Amplification Assessment

    Full text link
    Most pre-trained learning systems are known to suffer from bias, which typically emerges from the data, the model, or both. Measuring and quantifying bias and its sources is a challenging task and has been extensively studied in image captioning. Despite the significant effort in this direction, we observed that existing metrics lack consistency in the inclusion of the visual signal. In this paper, we introduce a new bias assessment metric, dubbed ImageCaptioner2ImageCaptioner^2, for image captioning. Instead of measuring the absolute bias in the model or the data, ImageCaptioner2ImageCaptioner^2 pay more attention to the bias introduced by the model w.r.t the data bias, termed bias amplification. Unlike the existing methods, which only evaluate the image captioning algorithms based on the generated captions only, ImageCaptioner2ImageCaptioner^2 incorporates the image while measuring the bias. In addition, we design a formulation for measuring the bias of generated captions as prompt-based image captioning instead of using language classifiers. Finally, we apply our ImageCaptioner2ImageCaptioner^2 metric across 11 different image captioning architectures on three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and Artemis V2, and on three different protected attributes, i.e., gender, race, and emotions. Consequently, we verify the effectiveness of our ImageCaptioner2ImageCaptioner^2 metric by proposing AnonymousBench, which is a novel human evaluation paradigm for bias metrics. Our metric shows significant superiority over the recent bias metric; LIC, in terms of human alignment, where the correlation scores are 80% and 54% for our metric and LIC, respectively. The code is available at https://eslambakr.github.io/imagecaptioner2.github.io/
    • …
    corecore