87 research outputs found
Saliency Prediction for Mobile User Interfaces
We introduce models for saliency prediction for mobile user interfaces. A
mobile interface may include elements like buttons, text, etc. in addition to
natural images which enable performing a variety of tasks. Saliency in natural
images is a well studied area. However, given the difference in what
constitutes a mobile interface, and the usage context of these devices, we
postulate that saliency prediction for mobile interface images requires a fresh
approach. Mobile interface design involves operating on elements, the building
blocks of the interface. We first collected eye-gaze data from mobile devices
for free viewing task. Using this data, we develop a novel autoencoder based
multi-scale deep learning model that provides saliency prediction at the mobile
interface element level. Compared to saliency prediction approaches developed
for natural images, we show that our approach performs significantly better on
a range of established metrics.Comment: Paper accepted at WACV 201
Medical Image Segmentation Based on Multi-Modal Convolutional Neural Network: Study on Image Fusion Schemes
Image analysis using more than one modality (i.e. multi-modal) has been
increasingly applied in the field of biomedical imaging. One of the challenges
in performing the multimodal analysis is that there exist multiple schemes for
fusing the information from different modalities, where such schemes are
application-dependent and lack a unified framework to guide their designs. In
this work we firstly propose a conceptual architecture for the image fusion
schemes in supervised biomedical image analysis: fusing at the feature level,
fusing at the classifier level, and fusing at the decision-making level.
Further, motivated by the recent success in applying deep learning for natural
image analysis, we implement the three image fusion schemes above based on the
Convolutional Neural Network (CNN) with varied structures, and combined into a
single framework. The proposed image segmentation framework is capable of
analyzing the multi-modality images using different fusing schemes
simultaneously. The framework is applied to detect the presence of soft tissue
sarcoma from the combination of Magnetic Resonance Imaging (MRI), Computed
Tomography (CT) and Positron Emission Tomography (PET) images. It is found from
the results that while all the fusion schemes outperform the single-modality
schemes, fusing at the feature level can generally achieve the best performance
in terms of both accuracy and computational cost, but also suffers from the
decreased robustness in the presence of large errors in any image modalities.Comment: Zhe Guo and Xiang Li contribute equally to this wor
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
Are all the frames equally important?
In this work, we address the problem of measuring and predicting temporal
video saliency - a metric which defines the importance of a video frame for
human attention. Unlike the conventional spatial saliency which defines the
location of the salient regions within a frame (as it is done for still
images), temporal saliency considers importance of a frame as a whole and may
not exist apart from context. The proposed interface is an interactive
cursor-based algorithm for collecting experimental data about temporal
saliency. We collect the first human responses and perform their analysis. As a
result, we show that qualitatively, the produced scores have very explicit
meaning of the semantic changes in a frame, while quantitatively being highly
correlated between all the observers. Apart from that, we show that the
proposed tool can simultaneously collect fixations similar to the ones produced
by eye-tracker in a more affordable way. Further, this approach may be used for
creation of first temporal saliency datasets which will allow training
computational predictive algorithms. The proposed interface does not rely on
any special equipment, which allows to run it remotely and cover a wide
audience.Comment: CHI'20 Late Breaking Work
- …