32 research outputs found
Rethinking Knowledge Graph Propagation for Zero-Shot Learning
Graph convolutional neural networks have recently shown great potential for
the task of zero-shot learning. These models are highly sample efficient as
related concepts in the graph structure share statistical strength allowing
generalization to new classes when faced with a lack of data. However,
multi-layer architectures, which are required to propagate knowledge to distant
nodes in the graph, dilute the knowledge by performing extensive Laplacian
smoothing at each layer and thereby consequently decrease performance. In order
to still enjoy the benefit brought by the graph structure while preventing
dilution of knowledge from distant nodes, we propose a Dense Graph Propagation
(DGP) module with carefully designed direct links among distant nodes. DGP
allows us to exploit the hierarchical graph structure of the knowledge graph
through additional connections. These connections are added based on a node's
relationship to its ancestors and descendants. A weighting scheme is further
used to weigh their contribution depending on the distance to the node to
improve information propagation in the graph. Combined with finetuning of the
representations in a two-stage training approach our method outperforms
state-of-the-art zero-shot learning approaches.Comment: The first two authors contributed equally. Code at
https://github.com/cyvius96/adgpm. To appear in CVPR 201
Dilated Temporal Relational Adversarial Network for Generic Video Summarization
The large amount of videos popping up every day, make it more and more
critical that key information within videos can be extracted and understood in
a very short time. Video summarization, the task of finding the smallest subset
of frames, which still conveys the whole story of a given video, is thus of
great significance to improve efficiency of video understanding. We propose a
novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to
achieve frame-level video summarization. Given a video, it selects the set of
key frames, which contain the most meaningful and compact information.
Specifically, DTR-GAN learns a dilated temporal relational generator and a
discriminator with three-player loss in an adversarial manner. A new dilated
temporal relation (DTR) unit is introduced to enhance temporal representation
capturing. The generator uses this unit to effectively exploit global
multi-scale temporal context to select key frames and to complement the
commonly used Bi-LSTM. To ensure that summaries capture enough key video
representation from a global perspective rather than a trivial randomly shorten
sequence, we present a discriminator that learns to enforce both the
information completeness and compactness of summaries via a three-player loss.
The loss includes the generated summary loss, the random summary loss, and the
real summary (ground-truth) loss, which play important roles for better
regularizing the learned model to obtain useful summaries. Comprehensive
experiments on three public datasets show the effectiveness of the proposed
approach
Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio
The cardiothoracic ratio (CTR), a clinical metric of heart size in chest
X-rays (CXRs), is a key indicator of cardiomegaly. Manual measurement of CTR is
time-consuming and can be affected by human subjectivity, making it desirable
to design computer-aided systems that assist clinicians in the diagnosis
process. Automatic CTR estimation through chest organ segmentation, however,
requires large amounts of pixel-level annotated data, which is often
unavailable. To alleviate this problem, we propose an unsupervised domain
adaptation framework based on adversarial networks. The framework learns domain
invariant feature representations from openly available data sources to produce
accurate chest organ segmentation for unlabeled datasets. Specifically, we
propose a model that enforces our intuition that prediction masks should be
domain independent. Hence, we introduce a discriminator that distinguishes
segmentation predictions from ground truth masks. We evaluate our system's
prediction based on the assessment of radiologists and demonstrate the clinical
practicability for the diagnosis of cardiomegaly. We finally illustrate on the
JSRT dataset that the semi-supervised performance of our model is also very
promising.Comment: Accepted by MICCAI 201
Towards Robust Partially Supervised Multi-Structure Medical Image Segmentation on Small-Scale Data
The data-driven nature of deep learning (DL) models for semantic segmentation
requires a large number of pixel-level annotations. However, large-scale and
fully labeled medical datasets are often unavailable for practical tasks.
Recently, partially supervised methods have been proposed to utilize images
with incomplete labels in the medical domain. To bridge the methodological gaps
in partially supervised learning (PSL) under data scarcity, we propose Vicinal
Labels Under Uncertainty (VLUU), a simple yet efficient framework utilizing the
human structure similarity for partially supervised medical image segmentation.
Motivated by multi-task learning and vicinal risk minimization, VLUU transforms
the partially supervised problem into a fully supervised problem by generating
vicinal labels. We systematically evaluate VLUU under the challenges of
small-scale data, dataset shift, and class imbalance on two commonly used
segmentation datasets for the tasks of chest organ segmentation and optic
disc-and-cup segmentation. The experimental results show that VLUU can
consistently outperform previous partially supervised models in these settings.
Our research suggests a new research direction in label-efficient deep learning
with partial supervision.Comment: Accepted by Applied Soft Computin
ConnNet: A Long-Range Relation-Aware Pixel-Connectivity Network for Salient Segmentation
Salient segmentation aims to segment out attention-grabbing regions, a critical yet challenging task and the foundation of many high-level computer vision applications. It requires semantic-aware grouping of pixels into salient regions and benefits from the utilization of global multi-scale contexts to achieve good local reasoning. Previous works often address it as two-class segmentation problems utilizing complicated multi-step procedures, including refinement networks and complex graphical models. We argue that semantic salient segmentation can instead be effectively resolved by reformulating it as a simple yet intuitive pixel-pair-based connectivity prediction task. Following the intuition that salient objects can be naturally grouped via semantic-aware connectivity between neighboring pixels, we propose a pure Connectivity Net (ConnNet). ConnNet predicts the connectivity probabilities of each pixel with its neighboring pixels by leveraging multi-level cascade contexts embedded in the image and long-range pixel relations. We investigate our approach on two tasks, namely, salient object segmentation and salient instance-level segmentation, and illustrate that consistent improvements can be obtained by modeling these tasks as connectivity instead of binary segmentation tasks for a variety of network architectures. We achieve the state-of-the-art performance, outperforming or being comparable to existing approaches while reducing inference time due to our less complex approach
Dilated temporal relational adversarial network for generic video summarization
The large amount of videos popping up every day, make it more and more critical that key information within videos can be extracted and understood in a very short time. Video summarization, the task of finding the smallest subset of frames, which still conveys the whole story of a given video, is thus of great significance to improve efficiency of video understanding. We propose a novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to achieve frame-level video summarization. Given a video, it selects the set of key frames, which contain the most meaningful and compact information. Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner. A new dilated temporal relation (DTR) unit is introduced to enhance temporal representation capturing. The generator uses this unit to effectively exploit global multi-scale temporal context to select key frames and to complement the commonly used Bi-LSTM. To ensure that summaries capture enough key video representation from a global perspective rather than a trivial randomly shorten sequence, we present a discriminator that learns to enforce both the information completeness and compactness of summaries via a three-player loss. The loss includes the generated summary loss, the random summary loss, and the real summary (ground-truth) loss, which play important roles for better regularizing the learned model to obtain useful summaries. Comprehensive experiments on three public datasets show the effectiveness of the proposed approach
Rethinking knowledge graph propagation for zero-shot learning
Graph convolutional neural networks have recently shown great potential for the task of zero-shot learning. These models are highly sample efficient as related concepts in the graph structure share statistical strength allowing generalization to new classes when faced with a lack of data. However, multi-layer architectures, which are required to propagate knowledge to distant nodes in the graph, dilute the knowledge by performing extensive Laplacian smoothing at each layer and thereby consequently decrease performance. In order to still enjoy the benefit brought by the graph structure while preventing dilution of knowledge from distant nodes, we propose a Dense Graph Propagation (DGP) module with carefully designed direct links among distant nodes. DGP allows us to exploit the hierarchical graph structure of the knowledge graph through additional connections. These connections are added based on a node's relationship to its ancestors and descendants. A weighting scheme is further used to weigh their contribution depending on the distance to the node to improve information propagation in the graph. Combined with finetuning of the representations in a two-stage training approach our method outperforms state-of-the-art zero-shot learning approaches
Unsupervised domain adaptation for automatic estimation of cardiothoracic ratio
The cardiothoracic ratio (CTR), a clinical metric of heart size in chest X-rays (CXRs), is a key indicator of cardiomegaly. Manual measurement of CTR is time-consuming and can be affected by human subjectivity, making it desirable to design computer-aided systems that assist clinicians in the diagnosis process. Automatic CTR estimation through chest organ segmentation, however, requires large amounts of pixel-level annotated data, which is often unavailable. To alleviate this problem, we propose an unsupervised domain adaptation framework based on adversarial networks. The framework learns domain invariant feature representations from openly available data sources to produce accurate chest organ segmentation for unlabeled datasets. Specifically, we propose a model that enforces our intuition that prediction masks should be domain independent. Hence, we introduce a discriminator that distinguishes segmentation predictions from ground truth masks. We evaluate our system’s prediction based on the assessment of radiologists and demonstrate the clinical practicability for the diagnosis of cardiomegaly. We finally illustrate on the JSRT dataset that the semi-supervised performance of our model is also very promising