37 research outputs found
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound
In medical imaging, manual annotations can be expensive to acquire and
sometimes infeasible to access, making conventional deep learning-based models
difficult to scale. As a result, it would be beneficial if useful
representations could be derived from raw data without the need for manual
annotations. In this paper, we propose to address the problem of
self-supervised representation learning with multi-modal ultrasound
video-speech raw data. For this case, we assume that there is a high
correlation between the ultrasound video and the corresponding narrative speech
audio of the sonographer. In order to learn meaningful representations, the
model needs to identify such correlation and at the same time understand the
underlying anatomical features. We designed a framework to model the
correspondence between video and audio without any kind of human annotations.
Within this framework, we introduce cross-modal contrastive learning and an
affinity-aware self-paced learning scheme to enhance correlation modelling.
Experimental evaluations on multi-modal fetal ultrasound video and audio show
that the proposed approach is able to learn strong representations and
transfers well to downstream tasks of standard plane detection and eye-gaze
prediction.Comment: MICCAI 2020 (early acceptance
Problems with Saliency Maps
Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures
How to look next? A data-driven approach for scanpath prediction
By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour
Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics
Dozens of new models on fixation prediction are published every year and
compared on open benchmarks such as MIT300 and LSUN. However, progress in the
field can be difficult to judge because models are compared using a variety of
inconsistent metrics. Here we show that no single saliency map can perform well
under all metrics. Instead, we propose a principled approach to solve the
benchmarking problem by separating the notions of saliency models, maps and
metrics. Inspired by Bayesian decision theory, we define a saliency model to be
a probabilistic model of fixation density prediction and a saliency map to be a
metric-specific prediction derived from the model density which maximizes the
expected performance on that metric given the model density. We derive these
optimal saliency maps for the most commonly used saliency metrics (AUC, sAUC,
NSS, CC, SIM, KL-Div) and show that they can be computed analytically or
approximated with high precision. We show that this leads to consistent
rankings in all metrics and avoids the penalties of using one saliency map for
all metrics. Our method allows researchers to have their model compete on many
different metrics with state-of-the-art in those metrics: "good" models will
perform well in all metrics.Comment: published at ECCV 201
Velocity tuning of friction with two trapped atoms
Our ability to control friction remains modest, as our understanding of the underlying microscopic processes is incomplete. Atomic force experiments have provided a wealth of results on the dependence of nanofriction on structure velocity and temperature but limitations in the dynamic range, time resolution, and control at the single-atom level have hampered a description from first principles. Here, using an ion-crystal system with single-atom, single-substrate-site spatial and single-slip temporal resolution we measure the friction force over nearly five orders of magnitude in velocity, and contiguously observe four distinct regimes, while controlling temperature and dissipation. We elucidate the interplay between thermal and structural lubricity for two coupled atoms, and provide a simple explanation in terms of the Peierls–Nabarro potential. This extensive control at the atomic scale enables fundamental studies of the interaction of many-atom surfaces, possibly into the quantum regime
Unified Image and Video Saliency Modeling
Visual saliency modeling for images and videos is treated as two independent
tasks in recent computer vision literature. While image saliency modeling is a
well-studied problem and progress on benchmarks like SALICON and MIT300 is
slowing, video saliency models have shown rapid gains on the recent DHF1K
benchmark. Here, we take a step back and ask: Can image and video saliency
modeling be approached via a unified model, with mutual benefit? We identify
different sources of domain shift between image and video saliency data and
between different video saliency datasets as a key challenge for effective
joint modelling. To address this we propose four novel domain adaptation
techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive
Smoothing and Bypass-RNN - in addition to an improved formulation of learned
Gaussian priors. We integrate these techniques into a simple and lightweight
encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and
video saliency data. We evaluate our method on the video saliency datasets
DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and
MIT300. With one set of parameters, UNISAL achieves state-of-the-art
performance on all video saliency datasets and is on par with the
state-of-the-art for image saliency datasets, despite faster runtime and a 5 to
20-fold smaller model size compared to all competing deep methods. We provide
retrospective analyses and ablation studies which confirm the importance of the
domain shift modeling. The code is available at
https://github.com/rdroste/unisalComment: Presented at the European Conference on Computer Vision (ECCV) 2020.
R. Droste and J. Jiao contributed equally to this work. v3: Updated Fig. 5a)
and added new MTI300 benchmark results to supp. materia
Topological Qubits with Majorana Fermions in Trapped Ions
We propose a method of encoding a topologically-protected qubit using
Majorana fermions in a trapped-ion chain. This qubit is protected against major
sources of decoherence, while local operations and measurements can be
realized. Furthermore, we show that an efficient quantum interface and memory
for arbitrary multiqubit photonic states can be built, encoding them into a set
of entangled Majorana-fermion qubits inside cavities.Comment: 9 pages, 2 figure
Kinks and nanofriction: Structural phases in few-atom chains
The frictional dynamics of interacting surfaces under forced translation are critically dependent on lattice commensurability. The highly nonlinear system of an elastic atomic chain sliding on an incommensurate periodic potential exhibits topological defects, known as kinks, that govern the frictional and translational dynamics. Performing experiments in a trapped-ion friction emulator, we observe two distinct structural and frictional phases: a commensurate high-friction phase where the ions stick-slip simultaneously over the lattice, and an incommensurate low-friction phase where the propagation of a kink breaks that simultaneity. We experimentally track the kink's propagation with atom-by-atom and sublattice site resolution and show that its velocity increases with commensurability. Our results elucidate the commensurate-incommensurate transition and the connection between the appearance of kinks and the reduction of friction in a finite system, with important consequences for controlling friction at nanocontacts
A Crowdsourced Alternative to Eye-tracking for Visualization Understanding
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s)