73,188 research outputs found
Hyp-UML: Hyperbolic Image Retrieval with Uncertainty-aware Metric Learning
Metric learning plays a critical role in training image retrieval and
classification. It is also a key algorithm in representation learning, e.g.,
for feature learning and its alignment in metric space. Hyperbolic embedding
has been recently developed. Compared to the conventional Euclidean embedding
in most of the previously developed models, Hyperbolic embedding can be more
effective in representing the hierarchical data structure. Second, uncertainty
estimation/measurement is a long-lasting challenge in artificial intelligence.
Successful uncertainty estimation can improve a machine learning model's
performance, robustness, and security. In Hyperbolic space, uncertainty
measurement is at least with equivalent, if not more, critical importance. In
this paper, we develop a Hyperbolic image embedding with uncertainty-aware
metric learning for image retrieval. We call our method Hyp-UML: Hyperbolic
Uncertainty-aware Metric Learning. Our contribution are threefold: we propose
an image embedding algorithm based on Hyperbolic space, with their
corresponding uncertainty value; we propose two types of uncertainty-aware
metric learning, for the popular Contrastive learning and conventional
margin-based metric learning, respectively. We perform extensive experimental
validations to prove that the proposed algorithm can achieve state-of-the-art
results among related methods. The comprehensive ablation study validates the
effectiveness of each component of the proposed algorithm
Compare More Nuanced:Pairwise Alignment Bilinear Network For Few-shot Fine-grained Learning
The recognition ability of human beings is developed in a progressive way.
Usually, children learn to discriminate various objects from coarse to
fine-grained with limited supervision. Inspired by this learning process, we
propose a simple yet effective model for the Few-Shot Fine-Grained (FSFG)
recognition, which tries to tackle the challenging fine-grained recognition
task using meta-learning. The proposed method, named Pairwise Alignment
Bilinear Network (PABN), is an end-to-end deep neural network. Unlike
traditional deep bilinear networks for fine-grained classification, which adopt
the self-bilinear pooling to capture the subtle features of images, the
proposed model uses a novel pairwise bilinear pooling to compare the nuanced
differences between base images and query images for learning a deep distance
metric. In order to match base image features with query image features, we
design feature alignment losses before the proposed pairwise bilinear pooling.
Experiment results on four fine-grained classification datasets and one generic
few-shot dataset demonstrate that the proposed model outperforms both the
state-ofthe-art few-shot fine-grained and general few-shot methods.Comment: ICME 2019 Ora
Attention Correctness in Neural Image Captioning
Attention mechanisms have recently been introduced in deep learning for
various tasks in natural language processing and computer vision. But despite
their popularity, the "correctness" of the implicitly-learned attention maps
has only been assessed qualitatively by visualization of several examples. In
this paper we focus on evaluating and improving the correctness of attention in
neural image captioning models. Specifically, we propose a quantitative
evaluation metric for the consistency between the generated attention maps and
human annotations, using recently released datasets with alignment between
regions in images and entities in captions. We then propose novel models with
different levels of explicit supervision for learning attention maps during
training. The supervision can be strong when alignment between regions and
caption entities are available, or weak when only object segments and
categories are provided. We show on the popular Flickr30k and COCO datasets
that introducing supervision of attention maps during training solidly improves
both attention correctness and caption quality, showing the promise of making
machine perception more human-like.Comment: To appear in AAAI-17. See http://www.cs.jhu.edu/~cxliu/ for
supplementary materia
Enhancing precision radiotherapy: image registration with deep learning and image fusion for treatment planning
Artificial intelligence is advancing in everyday life and supports its user by generating fast results in areas like communication or image recognition. This thesis aims at exploiting the abilities of deep-learning techniques for deformable image registration (DIR) to improve image alignment in medicine. An unsupervised registration and fusion workflow is developed and evaluated for 39 head scans, produced with computed tomography (CT) and magnetic resonance imaging (MRI). The three-part workflow starts by preprocessing the scans to unify the image formats and to perform affine transformation and rigid registration. Then, a deep-learning model trained for DIR is applied to these images. To obtain an appropriate configuration of the model, parameter tuning is required. The evaluation with the mutual-information metric indicates an improvement in image alignment of up to 14 % when using deep-learning-based DIR. Lastly, image fusion combines the registered CT and MRI scans with a wavelet-based method to merge the information of decomposed images. The workflow is designed for unimodal, e.g. T1- and T2-weighted MRI scans, and multimodal, e.g. CT and MRI scans, image pairs. Since medical imaging is an important basis of treatment-planning processes, the registered and fused images obtained from this workflow are expected to enhance precision radiotherapy
ImageCaptioner: Image Captioner for Image Captioning Bias Amplification Assessment
Most pre-trained learning systems are known to suffer from bias, which
typically emerges from the data, the model, or both. Measuring and quantifying
bias and its sources is a challenging task and has been extensively studied in
image captioning. Despite the significant effort in this direction, we observed
that existing metrics lack consistency in the inclusion of the visual signal.
In this paper, we introduce a new bias assessment metric, dubbed
, for image captioning. Instead of measuring the absolute
bias in the model or the data, pay more attention to the
bias introduced by the model w.r.t the data bias, termed bias amplification.
Unlike the existing methods, which only evaluate the image captioning
algorithms based on the generated captions only,
incorporates the image while measuring the bias. In addition, we design a
formulation for measuring the bias of generated captions as prompt-based image
captioning instead of using language classifiers. Finally, we apply our
metric across 11 different image captioning architectures on
three different datasets, i.e., MS-COCO caption dataset, Artemis V1, and
Artemis V2, and on three different protected attributes, i.e., gender, race,
and emotions. Consequently, we verify the effectiveness of our
metric by proposing AnonymousBench, which is a novel human
evaluation paradigm for bias metrics. Our metric shows significant superiority
over the recent bias metric; LIC, in terms of human alignment, where the
correlation scores are 80% and 54% for our metric and LIC, respectively. The
code is available at https://eslambakr.github.io/imagecaptioner2.github.io/
- …