507 research outputs found
CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation
The advancement of computer vision has pushed visual analysis tasks from
still images to the video domain. In recent years, video instance segmentation,
which aims to track and segment multiple objects in video frames, has drawn
much attention for its potential applications in various emerging areas such as
autonomous driving, intelligent transportation, and smart retail. In this
paper, we propose an effective framework for instance-level visual analysis on
video frames, which can simultaneously conduct object detection, instance
segmentation, and multi-object tracking. The core idea of our method is
collaborative multi-task learning which is achieved by a novel structure, named
associative connections among detection, segmentation, and tracking task heads
in an end-to-end learnable CNN. These additional connections allow information
propagation across multiple related tasks, so as to benefit these tasks
simultaneously. We evaluate the proposed method extensively on KITTI MOTS and
MOTS Challenge datasets and obtain quite encouraging results
Visual Recognition with Deep Nearest Centroids
We devise deep nearest centroids (DNC), a conceptually elegant yet
surprisingly effective network for large-scale visual recognition, by
revisiting Nearest Centroids, one of the most classic and simple classifiers.
Current deep models learn the classifier in a fully parametric manner, ignoring
the latent data structure and lacking simplicity and explainability. DNC
instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids
of training samples to describe class distributions and clearly explains the
classification as the proximity of test data and the class sub-centroids in the
feature space. Due to the distance-based nature, the network output
dimensionality is flexible, and all the learnable parameters are only for data
embedding. That means all the knowledge learnt for ImageNet classification can
be completely transferred for pixel recognition learning, under the
"pre-training and fine-tuning" paradigm. Apart from its nested simplicity and
intuitive decision-making mechanism, DNC can even possess ad-hoc explainability
when the sub-centroids are selected as actual training images that humans can
view and inspect. Compared with parametric counterparts, DNC performs better on
image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition
(ADE20K, Cityscapes), with improved transparency and fewer learnable
parameters, using various network architectures (ResNet, Swin) and segmentation
models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights
into related fields.Comment: 23 pages, 8 figure
Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?
As the scale of vision models continues to grow, the emergence of Visual
Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has
gained attention due to its superior performance compared to traditional
full-finetuning. However, the conditions favoring VPT (the ``when") and the
underlying rationale (the ``why") remain unclear. In this paper, we conduct a
comprehensive analysis across 19 distinct datasets and tasks. To understand the
``when" aspect, we identify the scenarios where VPT proves favorable by two
dimensions: task objectives and data distributions. We find that VPT is
preferrable when there is 1) a substantial disparity between the original and
the downstream task objectives (e.g., transitioning from classification to
counting), or 2) a similarity in data distributions between the two tasks
(e.g., both involve natural images). In exploring the ``why" dimension, our
results indicate VPT's success cannot be attributed solely to overfitting and
optimization considerations. The unique way VPT preserves original features and
adds parameters appears to be a pivotal factor. Our study provides insights
into VPT's mechanisms, and offers guidance for its optimal utilization.Comment: 29 pages, 19 figure
Phase-locking matter-wave interferometer of vortex states
Matter-wave interferometer of ultracold atoms with different linear momenta
has been extensively studied in theory and experiment. The vortex matter-wave
interferometer with different angular momenta is applicable as a quantum sensor
for measuring the rotation, interatomic interaction, geometric phase, etc. Here
we report the first experimental realization of a vortex matter-wave
interferometer by coherently transferring the optical angular momentum to an
ultracold Bose condensate. After producing a lossless interferometer with atoms
only populating the two spin states, we demonstrate that the phase difference
between the interferences in the two spin states is locked on . We also
demonstrate the robustness of this out-of-phase relation, which is independent
of the angular-momentum difference between the two interfering vortex states,
constituent of Raman optical fields and expansion of the condensate. The
experimental results agree well with the calculation from the unitary evolution
of wave packet in quantum mechanics. This work opens a new way to build a
quantum sensor and measure the atomic correlation in quantum gases.Comment: 5 figure
Image Translation as Diffusion Visual Programmers
We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic
image translation framework. Our proposed DVP seamlessly embeds a
condition-flexible diffusion model within the GPT architecture, orchestrating a
coherent sequence of visual programs (i.e., computer vision models) for various
pro-symbolic steps, which span RoI identification, style transfer, and position
manipulation, facilitating transparent and controllable image translation
processes. Extensive experiments demonstrate DVP's remarkable performance,
surpassing concurrent arts. This success can be attributed to several key
features of DVP: First, DVP achieves condition-flexible translation via
instance normalization, enabling the model to eliminate sensitivity caused by
the manual guidance and optimally focus on textual descriptions for
high-quality content generation. Second, the framework enhances in-context
reasoning by deciphering intricate high-dimensional concepts in feature spaces
into more accessible low-dimensional symbols (e.g., [Prompt], [RoI object]),
allowing for localized, context-free editing while maintaining overall
coherence. Last but not least, DVP improves systemic controllability and
explainability by offering explicit symbolic representations at each
programming stage, empowering users to intuitively interpret and modify
results. Our research marks a substantial step towards harmonizing artificial
image translation processes with cognitive intelligence, promising broader
applications.Comment: 25 pages, 20 figure
Terahertz-driven, all-optical electron gun
Ultrashort electron beams with narrow energy spread, high charge, and low
jitter are essential for resolving phase transitions in metals, semiconductors,
and molecular crystals. These semirelativistic beams, produced by
phototriggered electron guns, are also injected into accelerators for x-ray
light sources. The achievable resolution of these time-resolved electron
diffraction or x-ray experiments has been hindered by surface field and timing
jitter limitations in conventional RF guns, which thus far are <200 MV/m and
>96 fs, respectively. A gun driven by optically-generated single-cycle THz
pulses provides a practical solution to enable not only GV/m surface fields but
also absolute timing stability, since the pulses are generated by the same
laser as the phototrigger. Here, we demonstrate an all-optical THz gun yielding
peak electron energies approaching 1 keV, accelerated by 300 MV/m THz fields in
a novel micron-scale waveguide structure. We also achieve quasimonoenergetic,
sub-keV bunches with 32 fC of charge, which can already be used for
time-resolved low-energy electron diffraction. Such ultracompact, easy to
implement guns driven by intrinsically synchronized THz pulses that are pumped
by an amplified arm of the already present photoinjector laser provide a new
tool with potential to transform accelerator based science.Comment: 24 pages, 9 figure
Aberrant Brain Regional Homogeneity and Functional Connectivity of Entorhinal Cortex in Vascular Mild Cognitive Impairment: A Resting-State Functional MRI Study
The aim of this study was to investigate changes in regional homogeneity (ReHo) and the functional connectivity of the entorhinal cortex (EC) in vascular mild cognitive impairment (VaMCI) and to evaluate the relationships between such changes and neuropsychological measures in VaMCI individuals. In all, 31 patients with VaMCI and 32 normal controls (NCs) underwent rs-fMRI. Differences in whole-brain ReHo and seed-based bilateral EC functional connectivity (EC-FC) were determined. Pearson's correlation was used to evaluate the relationships between regions with significant group differences and different neuropsychological measures. Vascular mild cognitive impairment (VaMCI) patients had lower scores in Mini-mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) and higher ones in Activity of Daily Living (ADL) (p < 0.05). Vascular mild cognitive impairment (VaMCI) individuals had significantly lower ReHo in the left cerebellum and right lentiform nucleus than NCs (P < 0.05, TFCE FWE correction). Vascular mild cognitive impairment (VaMCI) subjects showed significant decreases in the FC of the right EC in the right inferior frontal gyrus, right middle frontal gyrus, bilateral pre-central gyrus, and right post-central/superior parietal lobules (P < 0.05, TFCE FWE correction). Significant positive correlations were found between ReHo and MoCA scores for the right lentiform nucleus (r = 0.37, P < 0.05). The right post-central/superior parietal lobules showed a significant positive correlation between right EC-FC and MoCA scores (r = 0.37, P < 0.05). Patterns in ReHo and EC-FC changes in VaMCI patients and their correlations with neuropsychological measures may be a pathophysiological foundation of cognitive impairment, which may aid the early diagnosis of VaMCI
Transarterial chemoembolization with or without multikinase inhibitors for patients with unresectable hepatocellular carcinoma: a systematic review and meta-analysis of randomized controlled trials
BackgroundRandomized controlled trials (RCTs) testing the combination therapy of transarterial chemoembolization (TACE) plus multikinase inhibitor (MKI) in patients with unresectable hepatocellular carcinoma (HCC) have yielded inconsistent results.MethodsIn this work, a systematic review and meta-analysis was performed to compare the TACE+MKI combination therapy versus TACE monotherapy in HCC patients with time to progression (TTP) adopted as primary outcome.ResultsA total of 10 RCTs comprising 2837 patients receiving combination therapy (TACE plus sorafenib, brivanib, orantinib or apatinib) were included. TACE+MKI significantly prolonged TTP (hazard ratio [HR] 0.74, 95% CI 0.62-0.89, p=0.001) versus TACE monotherapy. Subgroup analysis suggested MKI administration before TACE might be preferable to post-TACE MKI for TTP. TACE+MKI also increased objective response rate (ORR) (risk ratio [RR] 1.17, 95% CI 1.03-1.32, p=0.01), but failed to improve overall survival (OS) (HR 0.98, 95% CI 0.86-1.13, p=0.82) and progression-free survival (PFS) (HR 0.75, 95% CI 0.50-1.12, p=0.16). The incidence of any adverse event (AE) did not significantly differ between TACE+MKI and TACE groups (RR 1.17, 95% CI 0.96-1.42, p=0.01), while serious AEs showed significant difference (RR 1.41, 95% CI 1.26-1.59, p<0.0001). Nevertheless, these AEs showing significant difference were mainly associated with MKI toxicities rather than TACE. ConclusionsTACE+MKI combination therapy improved TTP and ORR but not OS and PFS in patients with unresectable HCC. Further high-quality trials are needed to verify these clinical benefits, and our findings could be very informative for future trial design
- …