11,879 research outputs found
Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval
In this paper, we propose a novel deep generative approach to cross-modal
retrieval to learn hash functions in the absence of paired training samples
through the cycle consistency loss. Our proposed approach employs adversarial
training scheme to lean a couple of hash functions enabling translation between
modalities while assuming the underlying semantic relationship. To induce the
hash codes with semantics to the input-output pair, cycle consistency loss is
further proposed upon the adversarial training to strengthen the correlations
between inputs and corresponding outputs. Our approach is generative to learn
hash functions such that the learned hash codes can maximally correlate each
input-output correspondence, meanwhile can also regenerate the inputs so as to
minimize the information loss. The learning to hash embedding is thus performed
to jointly optimize the parameters of the hash functions across modalities as
well as the associated generative models. Extensive experiments on a variety of
large-scale cross-modal data sets demonstrate that our proposed method achieves
better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text
overlap with arXiv:1703.10593 by other author
Composite Correlation Quantization for Efficient Multimodal Retrieval
Efficient similarity retrieval from large-scale multimodal database is
pervasive in modern search engines and social networks. To support queries
across content modalities, the system should enable cross-modal correlation and
computation-efficient indexing. While hashing methods have shown great
potential in achieving this goal, current attempts generally fail to learn
isomorphic hash codes in a seamless scheme, that is, they embed multiple
modalities in a continuous isomorphic space and separately threshold embeddings
into binary codes, which incurs substantial loss of retrieval accuracy. In this
paper, we approach seamless multimodal hashing by proposing a novel Composite
Correlation Quantization (CCQ) model. Specifically, CCQ jointly finds
correlation-maximal mappings that transform different modalities into
isomorphic latent space, and learns composite quantizers that convert the
isomorphic latent features into compact binary codes. An optimization framework
is devised to preserve both intra-modal similarity and inter-modal correlation
through minimizing both reconstruction and quantization errors, which can be
trained from both paired and partially paired data in linear time. A
comprehensive set of experiments clearly show the superior effectiveness and
efficiency of CCQ against the state of the art hashing methods for both
unimodal and cross-modal retrieval
Res2Net: A New Multi-scale Backbone Architecture
Representing features at multiple scales is of great importance for numerous
vision tasks. Recent advances in backbone convolutional neural networks (CNNs)
continually demonstrate stronger multi-scale representation ability, leading to
consistent performance gains on a wide range of applications. However, most
existing methods represent the multi-scale features in a layer-wise manner. In
this paper, we propose a novel building block for CNNs, namely Res2Net, by
constructing hierarchical residual-like connections within one single residual
block. The Res2Net represents multi-scale features at a granular level and
increases the range of receptive fields for each network layer. The proposed
Res2Net block can be plugged into the state-of-the-art backbone CNN models,
e.g., ResNet, ResNeXt, and DLA. We evaluate the Res2Net block on all these
models and demonstrate consistent performance gains over baseline models on
widely-used datasets, e.g., CIFAR-100 and ImageNet. Further ablation studies
and experimental results on representative computer vision tasks, i.e., object
detection, class activation mapping, and salient object detection, further
verify the superiority of the Res2Net over the state-of-the-art baseline
methods. The source code and trained models are available on
https://mmcheng.net/res2net/.Comment: 11 pages, 7 figure
EchoFusion: Tracking and Reconstruction of Objects in 4D Freehand Ultrasound Imaging without External Trackers
Ultrasound (US) is the most widely used fetal imaging technique. However, US
images have limited capture range, and suffer from view dependent artefacts
such as acoustic shadows. Compounding of overlapping 3D US acquisitions into a
high-resolution volume can extend the field of view and remove image artefacts,
which is useful for retrospective analysis including population based studies.
However, such volume reconstructions require information about relative
transformations between probe positions from which the individual volumes were
acquired. In prenatal US scans, the fetus can move independently from the
mother, making external trackers such as electromagnetic or optical tracking
unable to track the motion between probe position and the moving fetus. We
provide a novel methodology for image-based tracking and volume reconstruction
by combining recent advances in deep learning and simultaneous localisation and
mapping (SLAM). Tracking semantics are established through the use of a
Residual 3D U-Net and the output is fed to the SLAM algorithm. As a proof of
concept, experiments are conducted on US volumes taken from a whole body fetal
phantom, and from the heads of real fetuses. For the fetal head segmentation,
we also introduce a novel weak annotation approach to minimise the required
manual effort for ground truth annotation. We evaluate our method
qualitatively, and quantitatively with respect to tissue discrimination
accuracy and tracking robustness.Comment: MICCAI Workshop on Perinatal, Preterm and Paediatric Image analysis
(PIPPI), 201
Optimization Beyond the Convolution: Generalizing Spatial Relations with End-to-End Metric Learning
To operate intelligently in domestic environments, robots require the ability
to understand arbitrary spatial relations between objects and to generalize
them to objects of varying sizes and shapes. In this work, we present a novel
end-to-end approach to generalize spatial relations based on distance metric
learning. We train a neural network to transform 3D point clouds of objects to
a metric space that captures the similarity of the depicted spatial relations,
using only geometric models of the objects. Our approach employs gradient-based
optimization to compute object poses in order to imitate an arbitrary target
relation by reducing the distance to it under the learned metric. Our results
based on simulated and real-world experiments show that the proposed method
enables robots to generalize spatial relations to unknown objects over a
continuous spectrum.Comment: Accepted for publication at ICRA2018. Supplementary Video:
http://spatialrelations.cs.uni-freiburg.de
- …