37 research outputs found

    VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

    Full text link
    Large-scale text-to-image diffusion models have shown impressive capabilities across various generative tasks, enabled by strong vision-language alignment obtained through pre-training. However, most vision-language discriminative tasks require extensive fine-tuning on carefully-labeled datasets to acquire such alignment, with great cost in time and computing resources. In this work, we explore directly applying a pre-trained generative diffusion model to the challenging discriminative task of visual grounding without any fine-tuning and additional training dataset. Specifically, we propose VGDiffZero, a simple yet effective zero-shot visual grounding framework based on text-to-image diffusion models. We also design a comprehensive region-scoring method considering both global and local contexts of each isolated proposal. Extensive experiments on RefCOCO, RefCOCO+, and RefCOCOg show that VGDiffZero achieves strong performance on zero-shot visual grounding

    Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning

    Full text link
    Meta-learning for offline reinforcement learning (OMRL) is an understudied problem with tremendous potential impact by enabling RL algorithms in many real-world applications. A popular solution to the problem is to infer task identity as augmented state using a context-based encoder, for which efficient learning of robust task representations remains an open challenge. In this work, we provably improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives, to robustify task representation learning against sparse reward and distribution shift. Theoretical analysis and experiments are presented to demonstrate the superior performance and robustness of our end-to-end and model-free framework compared to prior algorithms across multiple meta-RL benchmarks.Comment: 21 pages, 7 figure

    Dynamics of Associative Polymers with High Density of Reversible Bonds

    Full text link
    We design and synthesize unentangled associative polymers carrying unprecedented high fractions of stickers, up to eight per Kuhn segment, that can form strong pairwise hydrogen bonding of 20kBT\sim20k_BT without microphase separation. The reversible bonds significantly slow down the polymer dynamics but nearly do not change the shape of linear viscoelastic spectra. Moreover, the structural relaxation time of associative polymers increases exponentially with the fraction of stickers and exhibits a universal yet non-Arrhenius dependence on the distance from polymer glass transition temperature. These results cannot be understood within the framework of the classic sticky-Rouse model but are rationalized by a renormalized Rouse model, which highlights an unexpected influence of reversible bonds on the structural relaxation rather than the shape of viscoelastic spectra for associative polymers with high concentrations of stickers.Comment: 4 figure

    Artificial intelligence-based non-invasive tumor segmentation, grade stratification and prognosis prediction for clear-cell renal-cell carcinoma

    Get PDF
    Due to the complicated histopathological characteristics of clear-cell renal-cell carcinoma (ccRCC), non-invasive prognosis before operative treatment is crucial in selecting the appropriate treatment. A total of 126 345 computerized tomography (CT) images from four independent patient cohorts were included for analysis in this study. We propose a V Bottleneck multi-resolution and focus-organ network (VB-MrFo-Net) using a cascade framework for deep learning analysis. The VB-MrFo-Net achieved better performance than VB-Net in tumor segmentation, with a Dice score of 0.87. The nuclear-grade prediction model performed best in the logistic regression classifier, with area under curve values from 0.782 to 0.746. Survival analysis revealed that our prediction model could significantly distinguish patients with high survival risk, with a hazard ratio (HR) of 2.49 [95% confidence interval (CI): 1.13-5.45, P = 0.023] in the General cohort. Excellent performance had also been verified in the Cancer Genome Atlas cohort, the Clinical Proteomic Tumor Analysis Consortium cohort, and the Kidney Tumor Segmentation Challenge cohort, with HRs of 2.77 (95%CI: 1.58-4.84, P = 0.0019), 3.83 (95%CI: 1.22-11.96, P = 0.029), and 2.80 (95%CI: 1.05-7.47, P = 0.025), respectively. In conclusion, we propose a novel VB-MrFo-Net for the renal tumor segmentation and automatic diagnosis of ccRCC. The risk stratification model could accurately distinguish patients with high tumor grade and high survival risk based on non-invasive CT images before surgical treatments, which could provide practical advice for deciding treatment options.</p

    Association between visceral fat area and diabetic retinopathy among people with type 2 diabetes mellitus: a cross-sectional study in Ningbo, Zhejiang Province, China

    Get PDF
    AimThe objective of this study is to investigate the relationship between visceral fat area (VFA) and diabetic retinopathy (DR) in the context of type 2 diabetes mellitus (T2DM) within Ningbo, China.MethodsThe data of a total of 3,707 subjects with T2DM treated at The First Affiliated Hospital of Ningbo University were enrolled. The existence and severity of diabetic retinopathy were assessed by employing the 45° two-field stereoscopic digital photography. Subjects were categorized into four distinct groups: those without DR (NDR), individuals with mild non-proliferative DR (mild NPDR), people with moderate non-proliferative DR (moderate NPDR), and those suffering from vision-threatening DR (VTDR). Bio-electrical impedance was employed to estimate the Visceral fat area (VFA). Multinomial logistic regression models were utilized to evaluate the association between VFA and DR.ResultsThe mean VFA in patients without diabetic retinopathy (NDR) was notably lower compared to that of patients with diabetic retinopathy (DR) (85.21 ± 37.78 vs. 97.37 ± 44.58 cm2, p &lt; 0.001). As the severity of DR increased, VFA increased gradually but insignificantly (94.41 ± 43.13 cm2, 96.75 ± 40.82 cm2, 100.84 ± 49.34 cm2, p = 0.294). After adjusting the confounding factors, there was an association identified between VFA and the occurrence of DR (OR = 1.020, 95% CI = 1.016–1.024). It showed that regardless of BMI, whether it’s less than 25 kg/m2 or greater than or equal to 25 kg/m2, a higher VFA (≥100 cm2) level came with a higher prevalence of DR (p &lt; 0.001).ConclusionThe outcomes of this research indicate a modest association between VFA and the incidence of DR among Chinese patients who have been diagnosed with T2DM in Ningbo

    Clear-Air Turbulence (CAT) Identification with X-Band Dual Polarimetric Radar Based on Bayesian Approach

    No full text
    The echo of weather radar is seriously disturbed by clear-air turbulence echo (CAT) which needs identifying and eliminating to improve the data quality of weather radar. Using the data observed with the five X-band dual polarimetric radars in Changping, Fangshan, Miyun, Shunyi, and Tongzhou, Beijing in 2018, the probability density distribution (PDD) of the horizontal texture of four radar moments reflectively factor (ZH), differential reflectivity (ZDR), correlation coefficient (ρHV), differential propagation phase shift (ΦDP), and then the CAT is identified and removed using Bayesian method. The results show that the radar data can be effectively improved after the CAT has been eliminated, which include: (1) the removal rate of CAT is more than 98.2% in the analyzed cases. (2) In the area with high-frequency distribution of CAT, the CAT can be effectively suppressed; in the area with low-frequency distribution, some weather echo in the edge with SNR < 15 dB may be mistakenly identified as CAT, but the proportion of meteorological echoes to the total echoes is more than 85%, which indicate that the error rate is very low and does not affect the radar operation

    Self-Attention ConvLSTM for Spatiotemporal Forecasting of Short-Term Online Car-Hailing Demand

    No full text
    As a flourishing basic transportation service in recent years, online car-hailing has made great achievements in metropolitan cities. Accurate spatiotemporal forecasting plays a significant role in the deployment of a network for online car-hailing demand services. A self-attention mechanism in convolutional long short-term memory (ConvLSTM) is proposed to accurately predict the online car-hailing demand. It can more effectively address the disadvantage that ConvLSTM is not good at capturing spatial correlation over a large spatial extent. Furthermore, it can generate features by aggregating pair-wise similarity scores of features at all positions of input and memory, and thus obtain the function of long-range spatiotemporal dependencies. First, the online car-hailing trajectories dataset was converted into images after geographic grid matching, and image enhancement was performed by cropping. Then, the effectiveness of the ConvLSTM embedded with a self-attention mechanism (SA-ConvLSTM) was demonstrated by comparing it to existing models. The experimental results showed that the proposed model performed better than the existing models, and including spatiotemporal information in images would perform better predictions than including spatial information in time-series pixels
    corecore