492 research outputs found
Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder
Multi-Entity Dependence Learning (MEDL) explores conditional correlations
among multiple entities. The availability of rich contextual information
requires a nimble learning scheme that tightly integrates with deep neural
networks and has the ability to capture correlation structures among
exponentially many outcomes. We propose MEDL_CVAE, which encodes a conditional
multivariate distribution as a generating process. As a result, the variational
lower bound of the joint likelihood can be optimized via a conditional
variational auto-encoder and trained end-to-end on GPUs. Our MEDL_CVAE was
motivated by two real-world applications in computational sustainability: one
studies the spatial correlation among multiple bird species using the eBird
data and the other models multi-dimensional landscape composition and human
footprint in the Amazon rainforest with satellite images. We show that
MEDL_CVAE captures rich dependency structures, scales better than previous
methods, and further improves on the joint likelihood taking advantage of very
large datasets that are beyond the capacity of previous methods.Comment: The first two authors contribute equall
Weakly Supervised Medical Image Segmentation With Soft Labels and Noise Robust Loss
Recent advances in deep learning algorithms have led to significant benefits
for solving many medical image analysis problems. Training deep learning models
commonly requires large datasets with expert-labeled annotations. However,
acquiring expert-labeled annotation is not only expensive but also is
subjective, error-prone, and inter-/intra- observer variability introduces
noise to labels. This is particularly a problem when using deep learning models
for segmenting medical images due to the ambiguous anatomical boundaries.
Image-based medical diagnosis tools using deep learning models trained with
incorrect segmentation labels can lead to false diagnoses and treatment
suggestions. Multi-rater annotations might be better suited to train deep
learning models with small training sets compared to single-rater annotations.
The aim of this paper was to develop and evaluate a method to generate
probabilistic labels based on multi-rater annotations and anatomical knowledge
of the lesion features in MRI and a method to train segmentation models using
probabilistic labels using normalized active-passive loss as a "noise-tolerant
loss" function. The model was evaluated by comparing it to binary ground truth
for 17 knees MRI scans for clinical segmentation and detection of bone marrow
lesions (BML). The proposed method successfully improved precision 14, recall
22, and Dice score 8 percent compared to a binary cross-entropy loss function.
Overall, the results of this work suggest that the proposed normalized
active-passive loss using soft labels successfully mitigated the effects of
noisy labels
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
Temporal action localization is an important yet challenging problem. Given a
long, untrimmed video consisting of multiple action instances and complex
background contents, we need not only to recognize their action categories, but
also to localize the start time and end time of each instance. Many
state-of-the-art systems use segment-level classifiers to select and rank
proposal segments of pre-determined boundaries. However, a desirable model
should move beyond segment-level and make dense predictions at a fine
granularity in time to determine precise temporal boundaries. To this end, we
design a novel Convolutional-De-Convolutional (CDC) network that places CDC
filters on top of 3D ConvNets, which have been shown to be effective for
abstracting action semantics but reduce the temporal length of the input data.
The proposed CDC filter performs the required temporal upsampling and spatial
downsampling operations simultaneously to predict actions at the frame-level
granularity. It is unique in jointly modeling action semantics in space-time
and fine-grained temporal dynamics. We train the CDC network in an end-to-end
manner efficiently. Our model not only achieves superior performance in
detecting actions in every frame, but also significantly boosts the precision
of localizing temporal boundaries. Finally, the CDC network demonstrates a very
high efficiency with the ability to process 500 frames per second on a single
GPU server. We will update the camera-ready version and publish the source
codes online soon.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
201
A Deeply Supervised Semantic Segmentation Method Based on GAN
In recent years, the field of intelligent transportation has witnessed rapid
advancements, driven by the increasing demand for automation and efficiency in
transportation systems. Traffic safety, one of the tasks integral to
intelligent transport systems, requires accurately identifying and locating
various road elements, such as road cracks, lanes, and traffic signs. Semantic
segmentation plays a pivotal role in achieving this task, as it enables the
partition of images into meaningful regions with accurate boundaries. In this
study, we propose an improved semantic segmentation model that combines the
strengths of adversarial learning with state-of-the-art semantic segmentation
techniques. The proposed model integrates a generative adversarial network
(GAN) framework into the traditional semantic segmentation model, enhancing the
model's performance in capturing complex and subtle features in transportation
images. The effectiveness of our approach is demonstrated by a significant
boost in performance on the road crack dataset compared to the existing
methods, \textit{i.e.,} SEGAN. This improvement can be attributed to the
synergistic effect of adversarial learning and semantic segmentation, which
leads to a more refined and accurate representation of road structures and
conditions. The enhanced model not only contributes to better detection of road
cracks but also to a wide range of applications in intelligent transportation,
such as traffic sign recognition, vehicle detection, and lane segmentation.Comment: 6 pages, 2 figures, ITSC conferenc
A goal-driven unsupervised image segmentation method combining graph-based processing and Markov random fields
Image segmentation is the process of partitioning a digital image into a set of homogeneous regions (according to some homogeneity criterion) to facilitate a subsequent higher-level analysis. In this context,
the present paper proposes an unsupervised and graph-based method of image segmentation, which is
driven by an application goal, namely, the generation of image segments associated with a user-defined
and application-specific goal. A graph, together with a random grid of source elements, is defined on
top of the input image. From each source satisfying a goal-driven predicate, called seed, a propagation
algorithm assigns a cost to each pixel on the basis of similarity and topological connectivity, measuring
the degree of association with the reference seed. Then, the set of most significant regions is automatically extracted and used to estimate a statistical model for each region. Finally, the segmentation problem is expressed in a Bayesian framework in terms of probabilistic Markov random field (MRF) graphical
modeling. An ad hoc energy function is defined based on parametric models, a seed-specific spatial feature, a background-specific potential, and local-contextual information. This energy function is minimized
through graph cuts and, more specifically, the alpha-beta swap algorithm, yielding the final goal-driven
segmentation based on the maximum a posteriori (MAP) decision rule. The proposed method does not
require deep a priori knowledge (e.g., labelled datasets), as it only requires the choice of a goal-driven
predicate and a suited parametric model for the data. In the experimental validation with both magnetic
resonance (MR) and synthetic aperture radar (SAR) images, the method demonstrates robustness, versatility, and applicability to different domains, thus allowing for further analyses guided by the generated
product
Pixel-level semantic understanding of ophthalmic images and beyond
Computer-assisted semantic image understanding constitutes the substrate of applications that range from biomarker detection to intraoperative guidance or street scene understanding for self-driving systems. This PhD thesis is on the development of deep learning-based, pixel-level, semantic segmentation methods for medical and natural images. For vessel segmentation in OCT-A, a method comprising iterative refinement of the extracted vessel maps and an auxiliary loss function that penalizes structural inaccuracies, is proposed and tested on data captured from real clinical conditions comprising various pathological cases. Ultimately, the presented method enables the extraction of a detailed vessel map of the retina with potential applications to diagnostics or intraoperative localization. Furthermore, for scene segmentation in cataract surgery, the major challenge of class imbalance is identified among several factors. Subsequently, a method addressing it is proposed, achieving state-of-the-art performance on a challenging public dataset. Accurate semantic segmentation in this domain can be used to monitor interactions between tools and anatomical parts for intraoperative guidance and safety. Finally, this thesis proposes a novel contrastive learning framework for supervised semantic segmentation, that aims to improve the discriminative power of features in deep neural networks. The proposed approach leverages contrastive loss function applied both at multiple model layers and across them. Importantly, the proposed framework is easy to combine with various model architectures and is experimentally shown to significantly improve performance on both natural and medical domain
- …