1,850 research outputs found
Deformable Audio Transformer for Audio Event Detection
Transformers have achieved promising results on a variety of tasks. However,
the quadratic complexity in self-attention computation has limited the
applications, especially in low-resource settings and mobile or edge devices.
Existing works have proposed to exploit hand-crafted attention patterns to
reduce computation complexity. However, such hand-crafted patterns are
data-agnostic and may not be optimal. Hence, it is likely that relevant keys or
values are being reduced, while less important ones are still preserved. Based
on this key insight, we propose a novel deformable audio Transformer for audio
recognition, named DATAR, where a deformable attention equipping with a pyramid
transformer backbone is constructed and learnable. Such an architecture has
been proven effective in prediction tasks,~\textit{e.g.}, event classification.
Moreover, we identify that the deformable attention map computation may
over-simplify the input feature, which can be further enhanced. Hence, we
introduce a learnable input adaptor to alleviate this issue, and DATAR achieves
state-of-the-art performance.Comment: ICASSP 2024. arXiv admin note: substantial text overlap with
arXiv:2201.00520 by other author
TPC-ViT: Token Propagation Controller for Efficient Vision Transformer
Vision transformers (ViTs) have achieved promising results on a variety of
Computer Vision tasks, however their quadratic complexity in the number of
input tokens has limited their application specially in resource-constrained
settings. Previous approaches that employ gradual token reduction to address
this challenge assume that token redundancy in one layer implies redundancy in
all the following layers. We empirically demonstrate that this assumption is
often not correct, i.e., tokens that are redundant in one layer can be useful
in later layers. We employ this key insight to propose a novel token
propagation controller (TPC) that incorporates two different
token-distributions, i.e., pause probability and restart probability to control
the reduction and reuse of tokens respectively, which results in more efficient
token utilization. To improve the estimates of token distributions, we propose
a smoothing mechanism that acts as a regularizer and helps remove noisy
outliers. Furthermore, to improve the training-stability of our proposed TPC,
we introduce a model stabilizer that is able to implicitly encode local image
structures and minimize accuracy fluctuations during model training. We present
extensive experimental results on the ImageNet-1K dataset using DeiT, LV-ViT
and Swin models to demonstrate the effectiveness of our proposed method. For
example, compared to baseline models, our proposed method improves the
inference speed of the DeiT-S by 250% while increasing the classification
accuracy by 1.0%.Comment: Accepted by the main conference of WACV 2024; well-formatted PDF is
in
https://drive.google.com/file/d/1Id3oEdYv3OWing1qojQMyjvhZO-gG-Dm/view?usp=sharing
; supplementary is in
https://drive.google.com/file/d/15LhYlBdCXtompA0_TLAp_ZJb4_sq2N5V/view?usp=sharin
Deep Learning for Automated Medical Image Analysis
Medical imaging is an essential tool in many areas of medical applications,
used for both diagnosis and treatment. However, reading medical images and
making diagnosis or treatment recommendations require specially trained medical
specialists. The current practice of reading medical images is labor-intensive,
time-consuming, costly, and error-prone. It would be more desirable to have a
computer-aided system that can automatically make diagnosis and treatment
recommendations. Recent advances in deep learning enable us to rethink the ways
of clinician diagnosis based on medical images. In this thesis, we will
introduce 1) mammograms for detecting breast cancers, the most frequently
diagnosed solid cancer for U.S. women, 2) lung CT images for detecting lung
cancers, the most frequently diagnosed malignant cancer, and 3) head and neck
CT images for automated delineation of organs at risk in radiotherapy. First,
we will show how to employ the adversarial concept to generate the hard
examples improving mammogram mass segmentation. Second, we will demonstrate how
to use the weakly labeled data for the mammogram breast cancer diagnosis by
efficiently design deep learning for multi-instance learning. Third, the thesis
will walk through DeepLung system which combines deep 3D ConvNets and GBM for
automated lung nodule detection and classification. Fourth, we will show how to
use weakly labeled data to improve existing lung nodule detection system by
integrating deep learning with a probabilistic graphic model. Lastly, we will
demonstrate the AnatomyNet which is thousands of times faster and more accurate
than previous methods on automated anatomy segmentation.Comment: PhD Thesi
Adversarial Deep Structured Nets for Mass Segmentation from Mammograms
Mass segmentation provides effective morphological features which are
important for mass diagnosis. In this work, we propose a novel end-to-end
network for mammographic mass segmentation which employs a fully convolutional
network (FCN) to model a potential function, followed by a CRF to perform
structured learning. Because the mass distribution varies greatly with pixel
position, the FCN is combined with a position priori. Further, we employ
adversarial training to eliminate over-fitting due to the small sizes of
mammogram datasets. Multi-scale FCN is employed to improve the segmentation
performance. Experimental results on two public datasets, INbreast and
DDSM-BCRP, demonstrate that our end-to-end network achieves better performance
than state-of-the-art approaches.
\footnote{https://github.com/wentaozhu/adversarial-deep-structural-networks.git}Comment: Accepted by ISBI2018. arXiv admin note: substantial text overlap with
arXiv:1612.0597
- …