507 research outputs found

    CML-MOTS: Collaborative Multi-task Learning for Multi-Object Tracking and Segmentation

    Full text link
    The advancement of computer vision has pushed visual analysis tasks from still images to the video domain. In recent years, video instance segmentation, which aims to track and segment multiple objects in video frames, has drawn much attention for its potential applications in various emerging areas such as autonomous driving, intelligent transportation, and smart retail. In this paper, we propose an effective framework for instance-level visual analysis on video frames, which can simultaneously conduct object detection, instance segmentation, and multi-object tracking. The core idea of our method is collaborative multi-task learning which is achieved by a novel structure, named associative connections among detection, segmentation, and tracking task heads in an end-to-end learnable CNN. These additional connections allow information propagation across multiple related tasks, so as to benefit these tasks simultaneously. We evaluate the proposed method extensively on KITTI MOTS and MOTS Challenge datasets and obtain quite encouraging results

    Visual Recognition with Deep Nearest Centroids

    Full text link
    We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers. Current deep models learn the classifier in a fully parametric manner, ignoring the latent data structure and lacking simplicity and explainability. DNC instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids of training samples to describe class distributions and clearly explains the classification as the proximity of test data and the class sub-centroids in the feature space. Due to the distance-based nature, the network output dimensionality is flexible, and all the learnable parameters are only for data embedding. That means all the knowledge learnt for ImageNet classification can be completely transferred for pixel recognition learning, under the "pre-training and fine-tuning" paradigm. Apart from its nested simplicity and intuitive decision-making mechanism, DNC can even possess ad-hoc explainability when the sub-centroids are selected as actual training images that humans can view and inspect. Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes), with improved transparency and fewer learnable parameters, using various network architectures (ResNet, Swin) and segmentation models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights into related fields.Comment: 23 pages, 8 figure

    Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

    Full text link
    As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning (VPT) as a parameter-efficient transfer learning technique has gained attention due to its superior performance compared to traditional full-finetuning. However, the conditions favoring VPT (the ``when") and the underlying rationale (the ``why") remain unclear. In this paper, we conduct a comprehensive analysis across 19 distinct datasets and tasks. To understand the ``when" aspect, we identify the scenarios where VPT proves favorable by two dimensions: task objectives and data distributions. We find that VPT is preferrable when there is 1) a substantial disparity between the original and the downstream task objectives (e.g., transitioning from classification to counting), or 2) a similarity in data distributions between the two tasks (e.g., both involve natural images). In exploring the ``why" dimension, our results indicate VPT's success cannot be attributed solely to overfitting and optimization considerations. The unique way VPT preserves original features and adds parameters appears to be a pivotal factor. Our study provides insights into VPT's mechanisms, and offers guidance for its optimal utilization.Comment: 29 pages, 19 figure

    Phase-locking matter-wave interferometer of vortex states

    Full text link
    Matter-wave interferometer of ultracold atoms with different linear momenta has been extensively studied in theory and experiment. The vortex matter-wave interferometer with different angular momenta is applicable as a quantum sensor for measuring the rotation, interatomic interaction, geometric phase, etc. Here we report the first experimental realization of a vortex matter-wave interferometer by coherently transferring the optical angular momentum to an ultracold Bose condensate. After producing a lossless interferometer with atoms only populating the two spin states, we demonstrate that the phase difference between the interferences in the two spin states is locked on π\pi. We also demonstrate the robustness of this out-of-phase relation, which is independent of the angular-momentum difference between the two interfering vortex states, constituent of Raman optical fields and expansion of the condensate. The experimental results agree well with the calculation from the unitary evolution of wave packet in quantum mechanics. This work opens a new way to build a quantum sensor and measure the atomic correlation in quantum gases.Comment: 5 figure

    Image Translation as Diffusion Visual Programmers

    Full text link
    We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion model within the GPT architecture, orchestrating a coherent sequence of visual programs (i.e., computer vision models) for various pro-symbolic steps, which span RoI identification, style transfer, and position manipulation, facilitating transparent and controllable image translation processes. Extensive experiments demonstrate DVP's remarkable performance, surpassing concurrent arts. This success can be attributed to several key features of DVP: First, DVP achieves condition-flexible translation via instance normalization, enabling the model to eliminate sensitivity caused by the manual guidance and optimally focus on textual descriptions for high-quality content generation. Second, the framework enhances in-context reasoning by deciphering intricate high-dimensional concepts in feature spaces into more accessible low-dimensional symbols (e.g., [Prompt], [RoI object]), allowing for localized, context-free editing while maintaining overall coherence. Last but not least, DVP improves systemic controllability and explainability by offering explicit symbolic representations at each programming stage, empowering users to intuitively interpret and modify results. Our research marks a substantial step towards harmonizing artificial image translation processes with cognitive intelligence, promising broader applications.Comment: 25 pages, 20 figure

    Terahertz-driven, all-optical electron gun

    Full text link
    Ultrashort electron beams with narrow energy spread, high charge, and low jitter are essential for resolving phase transitions in metals, semiconductors, and molecular crystals. These semirelativistic beams, produced by phototriggered electron guns, are also injected into accelerators for x-ray light sources. The achievable resolution of these time-resolved electron diffraction or x-ray experiments has been hindered by surface field and timing jitter limitations in conventional RF guns, which thus far are <200 MV/m and >96 fs, respectively. A gun driven by optically-generated single-cycle THz pulses provides a practical solution to enable not only GV/m surface fields but also absolute timing stability, since the pulses are generated by the same laser as the phototrigger. Here, we demonstrate an all-optical THz gun yielding peak electron energies approaching 1 keV, accelerated by 300 MV/m THz fields in a novel micron-scale waveguide structure. We also achieve quasimonoenergetic, sub-keV bunches with 32 fC of charge, which can already be used for time-resolved low-energy electron diffraction. Such ultracompact, easy to implement guns driven by intrinsically synchronized THz pulses that are pumped by an amplified arm of the already present photoinjector laser provide a new tool with potential to transform accelerator based science.Comment: 24 pages, 9 figure

    Aberrant Brain Regional Homogeneity and Functional Connectivity of Entorhinal Cortex in Vascular Mild Cognitive Impairment: A Resting-State Functional MRI Study

    Get PDF
    The aim of this study was to investigate changes in regional homogeneity (ReHo) and the functional connectivity of the entorhinal cortex (EC) in vascular mild cognitive impairment (VaMCI) and to evaluate the relationships between such changes and neuropsychological measures in VaMCI individuals. In all, 31 patients with VaMCI and 32 normal controls (NCs) underwent rs-fMRI. Differences in whole-brain ReHo and seed-based bilateral EC functional connectivity (EC-FC) were determined. Pearson's correlation was used to evaluate the relationships between regions with significant group differences and different neuropsychological measures. Vascular mild cognitive impairment (VaMCI) patients had lower scores in Mini-mental State Examination (MMSE) and Montreal Cognitive Assessment (MoCA) and higher ones in Activity of Daily Living (ADL) (p &lt; 0.05). Vascular mild cognitive impairment (VaMCI) individuals had significantly lower ReHo in the left cerebellum and right lentiform nucleus than NCs (P &lt; 0.05, TFCE FWE correction). Vascular mild cognitive impairment (VaMCI) subjects showed significant decreases in the FC of the right EC in the right inferior frontal gyrus, right middle frontal gyrus, bilateral pre-central gyrus, and right post-central/superior parietal lobules (P &lt; 0.05, TFCE FWE correction). Significant positive correlations were found between ReHo and MoCA scores for the right lentiform nucleus (r = 0.37, P &lt; 0.05). The right post-central/superior parietal lobules showed a significant positive correlation between right EC-FC and MoCA scores (r = 0.37, P &lt; 0.05). Patterns in ReHo and EC-FC changes in VaMCI patients and their correlations with neuropsychological measures may be a pathophysiological foundation of cognitive impairment, which may aid the early diagnosis of VaMCI

    Transarterial chemoembolization with or without multikinase inhibitors for patients with unresectable hepatocellular carcinoma: a systematic review and meta-analysis of randomized controlled trials

    Get PDF
    BackgroundRandomized controlled trials (RCTs) testing the combination therapy of transarterial chemoembolization (TACE) plus multikinase inhibitor (MKI) in patients with unresectable hepatocellular carcinoma (HCC) have yielded inconsistent results.MethodsIn this work, a systematic review and meta-analysis was performed to compare the TACE+MKI combination therapy versus TACE monotherapy in HCC patients with time to progression (TTP) adopted as primary outcome.ResultsA total of 10 RCTs comprising 2837 patients receiving combination therapy (TACE plus sorafenib, brivanib, orantinib or apatinib) were included. TACE+MKI significantly prolonged TTP (hazard ratio [HR] 0.74, 95% CI 0.62-0.89, p=0.001) versus TACE monotherapy. Subgroup analysis suggested MKI administration before TACE might be preferable to post-TACE MKI for TTP. TACE+MKI also increased objective response rate (ORR) (risk ratio [RR] 1.17, 95% CI 1.03-1.32, p=0.01), but failed to improve overall survival (OS) (HR 0.98, 95% CI 0.86-1.13, p=0.82) and progression-free survival (PFS) (HR 0.75, 95% CI 0.50-1.12, p=0.16). The incidence of any adverse event (AE) did not significantly differ between TACE+MKI and TACE groups (RR 1.17, 95% CI 0.96-1.42, p=0.01), while serious AEs showed significant difference (RR 1.41, 95% CI 1.26-1.59, p&lt;0.0001). Nevertheless, these AEs showing significant difference were mainly associated with MKI toxicities rather than TACE. ConclusionsTACE+MKI combination therapy improved TTP and ORR but not OS and PFS in patients with unresectable HCC. Further high-quality trials are needed to verify these clinical benefits, and our findings could be very informative for future trial design
    corecore