226 research outputs found
BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading
Diabetic retinopathy (DR) is a common retinal disease that leads to
blindness. For diagnosis purposes, DR image grading aims to provide automatic
DR grade classification, which is not addressed in conventional research
methods of binary DR image classification. Small objects in the eye images,
like lesions and microaneurysms, are essential to DR grading in medical
imaging, but they could easily be influenced by other objects. To address these
challenges, we propose a new deep learning architecture, called BiRA-Net, which
combines the attention model for feature extraction and bilinear model for
fine-grained classification. Furthermore, in considering the distance between
different grades of different DR categories, we propose a new loss function,
called grading loss, which leads to improved training convergence of the
proposed approach. Experimental results are provided to demonstrate the
superior performance of the proposed approach.Comment: Accepted at ICIP 201
Sea-Net: Squeeze-And-Excitation Attention Net For Diabetic Retinopathy Grading
Diabetes is one of the most common disease in individuals. \textit{Diabetic
retinopathy} (DR) is a complication of diabetes, which could lead to blindness.
Automatic DR grading based on retinal images provides a great diagnostic and
prognostic value for treatment planning. However, the subtle differences among
severity levels make it difficult to capture important features using
conventional methods. To alleviate the problems, a new deep learning
architecture for robust DR grading is proposed, referred to as SEA-Net, in
which, spatial attention and channel attention are alternatively carried out
and boosted with each other, improving the classification performance. In
addition, a hybrid loss function is proposed to further maximize the
inter-class distance and reduce the intra-class variability. Experimental
results have shown the effectiveness of the proposed architecture.Comment: Accepted to ICIP 202
Stochastic Simulation on System Reliability and Component Probabilistic Importance of Road Network
Because of the combination explosion problem, it is difficult to use probability analytical method to calculate the system reliability of large networks. The paper develops a stochastic simulation (Monte Carlo-based) method to study the system reliability and component probabilistic importance of the road network. The proposed method considers the characteristics of the practical road network as follows: both link (roadway segment) and node (intersection) components are emphasized in the road network; the reliability for a link or node component may be at the in-between state; namely, its reliability value is between 0 and 1. The method is then implemented using the object-oriented programming language C++ and integrated into a RARN-MGG (reliability analysis of road network using Monte Carlo, GIS, and grid) system. Finally, two numerical examples based on a simple road network and a large real road network, respectively, are carried out to characterize the feasibility and to demonstrate the strength of the stochastic simulation method
Semi-Supervised Self-Taught Deep Learning for Finger Bones Segmentation
Segmentation stands at the forefront of many high-level vision tasks. In this
study, we focus on segmenting finger bones within a newly introduced
semi-supervised self-taught deep learning framework which consists of a student
network and a stand-alone teacher module. The whole system is boosted in a
life-long learning manner wherein each step the teacher module provides a
refinement for the student network to learn with newly unlabeled data.
Experimental results demonstrate the superiority of the proposed method over
conventional supervised deep learning methods.Comment: IEEE BHI 2019 accepte
NLTGCR: A class of Nonlinear Acceleration Procedures based on Conjugate Residuals
This paper develops a new class of nonlinear acceleration algorithms based on
extending conjugate residual-type procedures from linear to nonlinear
equations. The main algorithm has strong similarities with Anderson
acceleration as well as with inexact Newton methods - depending on which
variant is implemented. We prove theoretically and verify experimentally, on a
variety of problems from simulation experiments to deep learning applications,
that our method is a powerful accelerated iterative algorithm.Comment: Under Revie
MS-MT: Multi-Scale Mean Teacher with Contrastive Unpaired Translation for Cross-Modality Vestibular Schwannoma and Cochlea Segmentation
Domain shift has been a long-standing issue for medical image segmentation.
Recently, unsupervised domain adaptation (UDA) methods have achieved promising
cross-modality segmentation performance by distilling knowledge from a
label-rich source domain to a target domain without labels. In this work, we
propose a multi-scale self-ensembling based UDA framework for automatic
segmentation of two key brain structures i.e., Vestibular Schwannoma (VS) and
Cochlea on high-resolution T2 images. First, a segmentation-enhanced
contrastive unpaired image translation module is designed for image-level
domain adaptation from source T1 to target T2. Next, multi-scale deep
supervision and consistency regularization are introduced to a mean teacher
network for self-ensemble learning to further close the domain gap.
Furthermore, self-training and intensity augmentation techniques are utilized
to mitigate label scarcity and boost cross-modality segmentation performance.
Our method demonstrates promising segmentation performance with a mean Dice
score of 83.8% and 81.4% and an average asymmetric surface distance (ASSD) of
0.55 mm and 0.26 mm for the VS and Cochlea, respectively in the validation
phase of the crossMoDA 2022 challenge.Comment: Accepted by BrainLes MICCAI proceedings (5th solution for MICCAI 2022
Cross-Modality Domain Adaptation (crossMoDA) Challenge
Rearrange Indoor Scenes for Human-Robot Co-Activity
We present an optimization-based framework for rearranging indoor furniture
to accommodate human-robot co-activities better. The rearrangement aims to
afford sufficient accessible space for robot activities without compromising
everyday human activities. To retain human activities, our algorithm preserves
the functional relations among furniture by integrating spatial and semantic
co-occurrence extracted from SUNCG and ConceptNet, respectively. By defining
the robot's accessible space by the amount of open space it can traverse and
the number of objects it can reach, we formulate the rearrangement for
human-robot co-activity as an optimization problem, solved by adaptive
simulated annealing (ASA) and covariance matrix adaptation evolution strategy
(CMA-ES). Our experiments on the SUNCG dataset quantitatively show that
rearranged scenes provide an average of 14% more accessible space and 30% more
objects to interact with. The quality of the rearranged scenes is qualitatively
validated by a human study, indicating the efficacy of the proposed strategy.Comment: 7 pages, 7 figures; Accepted by ICRA 202
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Recently, large-scale pre-trained language-image models like CLIP have shown
extraordinary capabilities for understanding spatial contents, but naively
transferring such models to video recognition still suffers from unsatisfactory
temporal modeling capabilities. Existing methods insert tunable structures into
or in parallel with the pre-trained model, which either requires
back-propagation through the whole pre-trained model and is thus
resource-demanding, or is limited by the temporal reasoning capability of the
pre-trained structure. In this work, we present DiST, which disentangles the
learning of spatial and temporal aspects of videos. Specifically, DiST uses a
dual-encoder structure, where a pre-trained foundation model acts as the
spatial encoder, and a lightweight network is introduced as the temporal
encoder. An integration branch is inserted between the encoders to fuse
spatio-temporal information. The disentangled spatial and temporal learning in
DiST is highly efficient because it avoids the back-propagation of massive
pre-trained parameters. Meanwhile, we empirically show that disentangled
learning with an extra network for integration benefits both spatial and
temporal understanding. Extensive experiments on five benchmarks show that DiST
delivers better performance than existing state-of-the-art methods by
convincing gaps. When pre-training on the large-scale Kinetics-710, we achieve
89.7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability
of DiST. Codes and models can be found in
https://github.com/alibaba-mmai-research/DiST.Comment: ICCV2023. Code: https://github.com/alibaba-mmai-research/DiS
- …