37 research outputs found

    Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound

    Get PDF
    In medical imaging, manual annotations can be expensive to acquire and sometimes infeasible to access, making conventional deep learning-based models difficult to scale. As a result, it would be beneficial if useful representations could be derived from raw data without the need for manual annotations. In this paper, we propose to address the problem of self-supervised representation learning with multi-modal ultrasound video-speech raw data. For this case, we assume that there is a high correlation between the ultrasound video and the corresponding narrative speech audio of the sonographer. In order to learn meaningful representations, the model needs to identify such correlation and at the same time understand the underlying anatomical features. We designed a framework to model the correspondence between video and audio without any kind of human annotations. Within this framework, we introduce cross-modal contrastive learning and an affinity-aware self-paced learning scheme to enhance correlation modelling. Experimental evaluations on multi-modal fetal ultrasound video and audio show that the proposed approach is able to learn strong representations and transfers well to downstream tasks of standard plane detection and eye-gaze prediction.Comment: MICCAI 2020 (early acceptance

    Problems with Saliency Maps

    Get PDF
    Despite the popularity that saliency models have gained in the computer vision community, they are most often conceived, exploited and benchmarked without taking heed of a number of problems and subtle issues they bring about. When saliency maps are used as proxies for the likelihood of fixating a location in a viewed scene, one such issue is the temporal dimension of visual attention deployment. Through a simple simulation it is shown how neglecting this dimension leads to results that at best cast shadows on the predictive performance of a model and its assessment via benchmarking procedures

    How to look next? A data-driven approach for scanpath prediction

    Get PDF
    By and large, current visual attention models mostly rely, when considering static stimuli, on the following procedure. Given an image, a saliency map is computed, which, in turn, might serve the purpose of predicting a sequence of gaze shifts, namely a scanpath instantiating the dynamics of visual attention deployment. The temporal pattern of attention unfolding is thus confined to the scanpath generation stage, whilst salience is conceived as a static map, at best conflating a number of factors (bottom-up information, top-down, spatial biases, etc.). In this note we propose a novel sequential scheme that consists of a three-stage processing relying on a center-bias model, a context/layout model, and an object-based model, respectively. Each stage contributes, at different times, to the sequential sampling of the final scanpath. We compare the method against classic scanpath generation that exploits state-of-the-art static saliency model. Results show that accounting for the structure of the temporal unfolding leads to gaze dynamics close to human gaze behaviour

    Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics

    Full text link
    Dozens of new models on fixation prediction are published every year and compared on open benchmarks such as MIT300 and LSUN. However, progress in the field can be difficult to judge because models are compared using a variety of inconsistent metrics. Here we show that no single saliency map can perform well under all metrics. Instead, we propose a principled approach to solve the benchmarking problem by separating the notions of saliency models, maps and metrics. Inspired by Bayesian decision theory, we define a saliency model to be a probabilistic model of fixation density prediction and a saliency map to be a metric-specific prediction derived from the model density which maximizes the expected performance on that metric given the model density. We derive these optimal saliency maps for the most commonly used saliency metrics (AUC, sAUC, NSS, CC, SIM, KL-Div) and show that they can be computed analytically or approximated with high precision. We show that this leads to consistent rankings in all metrics and avoids the penalties of using one saliency map for all metrics. Our method allows researchers to have their model compete on many different metrics with state-of-the-art in those metrics: "good" models will perform well in all metrics.Comment: published at ECCV 201

    Velocity tuning of friction with two trapped atoms

    Get PDF
    Our ability to control friction remains modest, as our understanding of the underlying microscopic processes is incomplete. Atomic force experiments have provided a wealth of results on the dependence of nanofriction on structure velocity and temperature but limitations in the dynamic range, time resolution, and control at the single-atom level have hampered a description from first principles. Here, using an ion-crystal system with single-atom, single-substrate-site spatial and single-slip temporal resolution we measure the friction force over nearly five orders of magnitude in velocity, and contiguously observe four distinct regimes, while controlling temperature and dissipation. We elucidate the interplay between thermal and structural lubricity for two coupled atoms, and provide a simple explanation in terms of the Peierls–Nabarro potential. This extensive control at the atomic scale enables fundamental studies of the interaction of many-atom surfaces, possibly into the quantum regime

    Unified Image and Video Saliency Modeling

    Full text link
    Visual saliency modeling for images and videos is treated as two independent tasks in recent computer vision literature. While image saliency modeling is a well-studied problem and progress on benchmarks like SALICON and MIT300 is slowing, video saliency models have shown rapid gains on the recent DHF1K benchmark. Here, we take a step back and ask: Can image and video saliency modeling be approached via a unified model, with mutual benefit? We identify different sources of domain shift between image and video saliency data and between different video saliency datasets as a key challenge for effective joint modelling. To address this we propose four novel domain adaptation techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive Smoothing and Bypass-RNN - in addition to an improved formulation of learned Gaussian priors. We integrate these techniques into a simple and lightweight encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and video saliency data. We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300. With one set of parameters, UNISAL achieves state-of-the-art performance on all video saliency datasets and is on par with the state-of-the-art for image saliency datasets, despite faster runtime and a 5 to 20-fold smaller model size compared to all competing deep methods. We provide retrospective analyses and ablation studies which confirm the importance of the domain shift modeling. The code is available at https://github.com/rdroste/unisalComment: Presented at the European Conference on Computer Vision (ECCV) 2020. R. Droste and J. Jiao contributed equally to this work. v3: Updated Fig. 5a) and added new MTI300 benchmark results to supp. materia

    Topological Qubits with Majorana Fermions in Trapped Ions

    Full text link
    We propose a method of encoding a topologically-protected qubit using Majorana fermions in a trapped-ion chain. This qubit is protected against major sources of decoherence, while local operations and measurements can be realized. Furthermore, we show that an efficient quantum interface and memory for arbitrary multiqubit photonic states can be built, encoding them into a set of entangled Majorana-fermion qubits inside cavities.Comment: 9 pages, 2 figure

    Kinks and nanofriction: Structural phases in few-atom chains

    No full text
    The frictional dynamics of interacting surfaces under forced translation are critically dependent on lattice commensurability. The highly nonlinear system of an elastic atomic chain sliding on an incommensurate periodic potential exhibits topological defects, known as kinks, that govern the frictional and translational dynamics. Performing experiments in a trapped-ion friction emulator, we observe two distinct structural and frictional phases: a commensurate high-friction phase where the ions stick-slip simultaneously over the lattice, and an incommensurate low-friction phase where the propagation of a kink breaks that simultaneity. We experimentally track the kink's propagation with atom-by-atom and sublattice site resolution and show that its velocity increases with commensurability. Our results elucidate the commensurate-incommensurate transition and the connection between the appearance of kinks and the reduction of friction in a finite system, with important consequences for controlling friction at nanocontacts

    A Crowdsourced Alternative to Eye-tracking for Visualization Understanding

    No full text
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s)
    corecore