32,789 research outputs found
Gaze Embeddings for Zero-Shot Image Classification
Zero-shot image classification using auxiliary information, such as
attributes describing discriminative object properties, requires time-consuming
annotation by domain experts. We instead propose a method that relies on human
gaze as auxiliary information, exploiting that even non-expert users have a
natural ability to judge class membership. We present a data collection
paradigm that involves a discrimination task to increase the information
content obtained from gaze data. Our method extracts discriminative descriptors
from the data and learns a compatibility function between image and gaze using
three novel gaze embeddings: Gaze Histograms (GH), Gaze Features with Grid
(GFG) and Gaze Features with Sequence (GFS). We introduce two new
gaze-annotated datasets for fine-grained image classification and show that
human gaze data is indeed class discriminative, provides a competitive
alternative to expert-annotated attributes, and outperforms other baselines for
zero-shot image classification
One-Shot Fine-Grained Instance Retrieval
Fine-Grained Visual Categorization (FGVC) has achieved significant progress
recently. However, the number of fine-grained species could be huge and
dynamically increasing in real scenarios, making it difficult to recognize
unseen objects under the current FGVC framework. This raises an open issue to
perform large-scale fine-grained identification without a complete training
set. Aiming to conquer this issue, we propose a retrieval task named One-Shot
Fine-Grained Instance Retrieval (OSFGIR). "One-Shot" denotes the ability of
identifying unseen objects through a fine-grained retrieval task assisted with
an incomplete auxiliary training set. This paper first presents the detailed
description to OSFGIR task and our collected OSFGIR-378K dataset. Next, we
propose the Convolutional and Normalization Networks (CN-Nets) learned on the
auxiliary dataset to generate a concise and discriminative representation.
Finally, we present a coarse-to-fine retrieval framework consisting of three
components, i.e., coarse retrieval, fine-grained retrieval, and query
expansion, respectively. The framework progressively retrieves images with
similar semantics, and performs fine-grained identification. Experiments show
our OSFGIR framework achieves significantly better accuracy and efficiency than
existing FGVC and image retrieval methods, thus could be a better solution for
large-scale fine-grained object identification.Comment: Accepted by MM2017, 9 pages, 7 figure
MILD-Net: Minimal Information Loss Dilated Network for Gland Instance Segmentation in Colon Histology Images
The analysis of glandular morphology within colon histopathology images is an
important step in determining the grade of colon cancer. Despite the importance
of this task, manual segmentation is laborious, time-consuming and can suffer
from subjectivity among pathologists. The rise of computational pathology has
led to the development of automated methods for gland segmentation that aim to
overcome the challenges of manual segmentation. However, this task is
non-trivial due to the large variability in glandular appearance and the
difficulty in differentiating between certain glandular and non-glandular
histological structures. Furthermore, a measure of uncertainty is essential for
diagnostic decision making. To address these challenges, we propose a fully
convolutional neural network that counters the loss of information caused by
max-pooling by re-introducing the original image at multiple points within the
network. We also use atrous spatial pyramid pooling with varying dilation rates
for preserving the resolution and multi-level aggregation. To incorporate
uncertainty, we introduce random transformations during test time for an
enhanced segmentation result that simultaneously generates an uncertainty map,
highlighting areas of ambiguity. We show that this map can be used to define a
metric for disregarding predictions with high uncertainty. The proposed network
achieves state-of-the-art performance on the GlaS challenge dataset and on a
second independent colorectal adenocarcinoma dataset. In addition, we perform
gland instance segmentation on whole-slide images from two further datasets to
highlight the generalisability of our method. As an extension, we introduce
MILD-Net+ for simultaneous gland and lumen segmentation, to increase the
diagnostic power of the network.Comment: Initial version published at Medical Imaging with Deep Learning
(MIDL) 201
RADNET: Radiologist Level Accuracy using Deep Learning for HEMORRHAGE detection in CT Scans
We describe a deep learning approach for automated brain hemorrhage detection
from computed tomography (CT) scans. Our model emulates the procedure followed
by radiologists to analyse a 3D CT scan in real-world. Similar to radiologists,
the model sifts through 2D cross-sectional slices while paying close attention
to potential hemorrhagic regions. Further, the model utilizes 3D context from
neighboring slices to improve predictions at each slice and subsequently,
aggregates the slice-level predictions to provide diagnosis at CT level. We
refer to our proposed approach as Recurrent Attention DenseNet (RADnet) as it
employs original DenseNet architecture along with adding the components of
attention for slice level predictions and recurrent neural network layer for
incorporating 3D context. The real-world performance of RADnet has been
benchmarked against independent analysis performed by three senior radiologists
for 77 brain CTs. RADnet demonstrates 81.82% hemorrhage prediction accuracy at
CT level that is comparable to radiologists. Further, RADnet achieves higher
recall than two of the three radiologists, which is remarkable.Comment: Accepted at IEEE Symposium on Biomedical Imaging (ISBI) 2018 as
conference pape
Task Decomposition and Synchronization for Semantic Biomedical Image Segmentation
Semantic segmentation is essentially important to biomedical image analysis.
Many recent works mainly focus on integrating the Fully Convolutional Network
(FCN) architecture with sophisticated convolution implementation and deep
supervision. In this paper, we propose to decompose the single segmentation
task into three subsequent sub-tasks, including (1) pixel-wise image
segmentation, (2) prediction of the class labels of the objects within the
image, and (3) classification of the scene the image belonging to. While these
three sub-tasks are trained to optimize their individual loss functions of
different perceptual levels, we propose to let them interact by the task-task
context ensemble. Moreover, we propose a novel sync-regularization to penalize
the deviation between the outputs of the pixel-wise segmentation and the class
prediction tasks. These effective regularizations help FCN utilize context
information comprehensively and attain accurate semantic segmentation, even
though the number of the images for training may be limited in many biomedical
applications. We have successfully applied our framework to three diverse 2D/3D
medical image datasets, including Robotic Scene Segmentation Challenge 18
(ROBOT18), Brain Tumor Segmentation Challenge 18 (BRATS18), and Retinal Fundus
Glaucoma Challenge (REFUGE18). We have achieved top-tier performance in all
three challenges.Comment: IEEE Transactions on Medical Imagin
- …