139 research outputs found
RGBT Tracking via Progressive Fusion Transformer with Dynamically Guided Learning
Existing Transformer-based RGBT tracking methods either use cross-attention
to fuse the two modalities, or use self-attention and cross-attention to model
both modality-specific and modality-sharing information. However, the
significant appearance gap between modalities limits the feature representation
ability of certain modalities during the fusion process. To address this
problem, we propose a novel Progressive Fusion Transformer called ProFormer,
which progressively integrates single-modality information into the multimodal
representation for robust RGBT tracking. In particular, ProFormer first uses a
self-attention module to collaboratively extract the multimodal representation,
and then uses two cross-attention modules to interact it with the features of
the dual modalities respectively. In this way, the modality-specific
information can well be activated in the multimodal representation. Finally, a
feed-forward network is used to fuse two interacted multimodal representations
for the further enhancement of the final multimodal representation. In
addition, existing learning methods of RGBT trackers either fuse multimodal
features into one for final classification, or exploit the relationship between
unimodal branches and fused branch through a competitive learning strategy.
However, they either ignore the learning of single-modality branches or result
in one branch failing to be well optimized. To solve these problems, we propose
a dynamically guided learning algorithm that adaptively uses well-performing
branches to guide the learning of other branches, for enhancing the
representation ability of each branch. Extensive experiments demonstrate that
our proposed ProFormer sets a new state-of-the-art performance on RGBT210,
RGBT234, LasHeR, and VTUAV datasets.Comment: 13 pages, 9 figure
Multi-level feature fusion network combining attention mechanisms for polyp segmentation
Clinically, automated polyp segmentation techniques have the potential to
significantly improve the efficiency and accuracy of medical diagnosis, thereby
reducing the risk of colorectal cancer in patients. Unfortunately, existing
methods suffer from two significant weaknesses that can impact the accuracy of
segmentation. Firstly, features extracted by encoders are not adequately
filtered and utilized. Secondly, semantic conflicts and information redundancy
caused by feature fusion are not attended to. To overcome these limitations, we
propose a novel approach for polyp segmentation, named MLFF-Net, which
leverages multi-level feature fusion and attention mechanisms. Specifically,
MLFF-Net comprises three modules: Multi-scale Attention Module (MAM),
High-level Feature Enhancement Module (HFEM), and Global Attention Module
(GAM). Among these, MAM is used to extract multi-scale information and polyp
details from the shallow output of the encoder. In HFEM, the deep features of
the encoders complement each other by aggregation. Meanwhile, the attention
mechanism redistributes the weight of the aggregated features, weakening the
conflicting redundant parts and highlighting the information useful to the
task. GAM combines features from the encoder and decoder features, as well as
computes global dependencies to prevent receptive field locality. Experimental
results on five public datasets show that the proposed method not only can
segment multiple types of polyps but also has advantages over current
state-of-the-art methods in both accuracy and generalization ability
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts
In this paper, we tackle the problem of domain shift. Most existing methods
perform training on multiple source domains using a single model, and the same
trained model is used on all unseen target domains. Such solutions are
sub-optimal as each target domain exhibits its own speciality, which is not
adapted. Furthermore, expecting the single-model training to learn extensive
knowledge from the multiple source domains is counterintuitive. The model is
more biased toward learning only domain-invariant features and may result in
negative knowledge transfer. In this work, we propose a novel framework for
unsupervised test-time adaptation, which is formulated as a knowledge
distillation process to address domain shift. Specifically, we incorporate
Mixture-of-Experts (MoE) as teachers, where each expert is separately trained
on different source domains to maximize their speciality. Given a test-time
target domain, a small set of unlabeled data is sampled to query the knowledge
from MoE. As the source domains are correlated to the target domains, a
transformer-based aggregator then combines the domain knowledge by examining
the interconnection among them. The output is treated as a supervision signal
to adapt a student prediction network toward the target domain. We further
employ meta-learning to enforce the aggregator to distill positive knowledge
and the student network to achieve fast adaptation. Extensive experiments
demonstrate that the proposed method outperforms the state-of-the-art and
validates the effectiveness of each proposed component. Our code is available
at https://github.com/n3il666/Meta-DMoE.Comment: Accepted at NeurIPS202
Comparison of astigmatism correction and visual outcomes in mix-and-match implantations of trifocal intraocular lenses with femtosecond laser-assisted arcuate keratotomy and contralateral bifocal Toric intraocular lenses
IntroductionAstigmatism reduces the postoperative visual performance after non-toric intraocular lenses (IOLs) implantation, and limits the use of refractive IOLs in cataract surgery. The purpose of this study was to compare the efficacy in astigmatism correction and the postoperative visual outcomes between the implantation of a trifocal IOL with femtosecond laser-assisted arcuate keratotomy (FSAK) in one eye and a bifocal toric IOL (TIOL) in the other, in patients with cataract and moderate astigmatism.MethodsThis prospective observational paired-eye study enrolled patients with cataract and corneal astigmatism (CA) between 0.75 and 2.25 D in both eyes. The patients underwent a mix-and-match treatment comprising trifocal IOL implantation with FSAK and bifocal TIOL implantation. We compared the visual acuity (VA) at all distances, defocus curve, postoperative refractive astigmatism (RfA), CA, high-order aberrations, modulation transfer function (MTF) curve, and Strehl ratio between the two eye groups.ResultsIn total, 41 patients (82 eyes) were enrolled and completed a 6-month follow-up. The 1- and 3-month uncorrected distance VA and 3-month uncorrected near VA were greater in eyes with bifocal TIOLs than with trifocal IOLs and FSAK (p = 0.036, 0.010, and 0.030, respectively), whereas the latter had greater uncorrected intermediate VA at every visit and greater VA in the intermediate range of defocus curve (at −1.50 and − 2.00 D) than the eyes with bifocal TIOLs. The postoperative RA of the eyes with trifocal IOL and FSAK was significantly higher than that of the bifocal TIOL-implanted eyes at the 3- and 6-month follow-ups.DiscussionBoth FSAK and TIOL implantation effectively reduce pre-existing moderate astigmatism in patients with cataract. The eyes with bifocal TIOLs had more stable long-term astigmatism correction, whereas those with trifocal IOLs and FSAK had better intermediate VA. Therefore, a mix-and-match implantation of trifocal IOL with FSAK and contralateral bifocal TIOL could achieve effective astigmatism correction and provide an overall optimal VA
Few-Shot Class-Incremental Learning via Entropy-Regularized Data-Free Replay
Few-shot class-incremental learning (FSCIL) has been proposed aiming to
enable a deep learning system to incrementally learn new classes with limited
data. Recently, a pioneer claims that the commonly used replay-based method in
class-incremental learning (CIL) is ineffective and thus not preferred for
FSCIL. This has, if truth, a significant influence on the fields of FSCIL. In
this paper, we show through empirical results that adopting the data replay is
surprisingly favorable. However, storing and replaying old data can lead to a
privacy concern. To address this issue, we alternatively propose using
data-free replay that can synthesize data by a generator without accessing real
data. In observing the the effectiveness of uncertain data for knowledge
distillation, we impose entropy regularization in the generator training to
encourage more uncertain examples. Moreover, we propose to relabel the
generated data with one-hot-like labels. This modification allows the network
to learn by solely minimizing the cross-entropy loss, which mitigates the
problem of balancing different objectives in the conventional knowledge
distillation approach. Finally, we show extensive experimental results and
analysis on CIFAR-100, miniImageNet and CUB-200 to demonstrate the
effectiveness of our proposed one.Comment: Accepted by ECCV 202
CRSOT: Cross-Resolution Object Tracking using Unaligned Frame and Event Cameras
Existing datasets for RGB-DVS tracking are collected with DVS346 camera and
their resolution () is low for practical applications.
Actually, only visible cameras are deployed in many practical systems, and the
newly designed neuromorphic cameras may have different resolutions. The latest
neuromorphic sensors can output high-definition event streams, but it is very
difficult to achieve strict alignment between events and frames on both spatial
and temporal views. Therefore, how to achieve accurate tracking with unaligned
neuromorphic and visible sensors is a valuable but unresearched problem. In
this work, we formally propose the task of object tracking using unaligned
neuromorphic and visible cameras. We build the first unaligned frame-event
dataset CRSOT collected with a specially built data acquisition system, which
contains 1,030 high-definition RGB-Event video pairs, 304,974 video frames. In
addition, we propose a novel unaligned object tracking framework that can
realize robust tracking even using the loosely aligned RGB-Event data.
Specifically, we extract the template and search regions of RGB and Event data
and feed them into a unified ViT backbone for feature embedding. Then, we
propose uncertainty perception modules to encode the RGB and Event features,
respectively, then, we propose a modality uncertainty fusion module to
aggregate the two modalities. These three branches are jointly optimized in the
training phase. Extensive experiments demonstrate that our tracker can
collaborate the dual modalities for high-performance tracking even without
strictly temporal and spatial alignment. The source code, dataset, and
pre-trained models will be released at
https://github.com/Event-AHU/Cross_Resolution_SOT.Comment: In Peer Revie
Deca : a garbage collection optimizer for in-memory data processing
In-memory caching of intermediate data and active combining of data in shuffle buffers have been shown to be very effective in minimizing the recomputation and I/O cost in big data processing systems such as Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap. These generated objects may quickly saturate the garbage collector, especially when handling a large dataset, and hence, limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which, by automatically analyzing the user-defined functions and data types, obtains the expected lifetime of the data objects and then allocates and releases memory space accordingly to minimize the garbage collection overhead. In particular, we present Deca,1 a concrete implementation of our proposal on top of Spark, which transparently decomposes and groups objects with similar lifetimes into byte arrays and releases their space altogether when their lifetimes come to an end. When systems are processing very large data, Deca also provides field-oriented memory pages to ensure high compression efficiency. Extensive experimental studies using both synthetic and real datasets show that, in comparing to Spark, Deca is able to (1) reduce the garbage collection time by up to 99.9%, (2) reduce the memory consumption by up to 46.6% and the storage space by 23.4%, (3) achieve 1.2× to 22.7× speedup in terms of execution time in cases without data spilling and 16× to 41.6× speedup in cases with data spilling, and (4) provide similar performance compared to domain-specific systems
Masked Pre-trained Model Enables Universal Zero-shot Denoiser
In this work, we observe that the model, which is trained on vast general
images using masking strategy, has been naturally embedded with the
distribution knowledge regarding natural images, and thus spontaneously attains
the underlying potential for strong image denoising. Based on this observation,
we propose a novel zero-shot denoising paradigm, i.e., Masked Pre-train then
Iterative fill (MPI). MPI pre-trains a model with masking and fine-tunes it for
denoising of a single image with unseen noise degradation. Concretely, the
proposed MPI comprises two key procedures: 1) Masked Pre-training involves
training a model on multiple natural images with random masks to gather
generalizable representations, allowing for practical applications in varying
noise degradation and even in distinct image types. 2) Iterative filling is
devised to efficiently fuse pre-trained knowledge for denoising. Similar to but
distinct from pre-training, random masking is retained to bridge the gap, but
only the predicted parts covered by masks are assembled for efficiency, which
enables high-quality denoising within a limited number of iterations.
Comprehensive experiments across various noisy scenarios underscore the notable
advances of proposed MPI over previous approaches with a marked reduction in
inference time. Code is available at https://github.com/krennic999/MPI.git.Comment: 11 pages, 9 figure
- …