Search CORE

37 research outputs found

VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders

Author: Chen Honggang
Huang Siteng
Kang Yachen
Liu Xuyang
Wang Donglin
Publication venue
Publication date: 03/09/2023
Field of study

Large-scale text-to-image diffusion models have shown impressive capabilities across various generative tasks, enabled by strong vision-language alignment obtained through pre-training. However, most vision-language discriminative tasks require extensive fine-tuning on carefully-labeled datasets to acquire such alignment, with great cost in time and computing resources. In this work, we explore directly applying a pre-trained generative diffusion model to the challenging discriminative task of visual grounding without any fine-tuning and additional training dataset. Specifically, we propose VGDiffZero, a simple yet effective zero-shot visual grounding framework based on text-to-image diffusion models. We also design a comprehensive region-scoring method considering both global and local contexts of each isolated proposal. Extensive experiments on RefCOCO, RefCOCO+, and RefCOCOg show that VGDiffZero achieves strong performance on zero-shot visual grounding

arXiv.org e-Print Archive

Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning

Author: Chen Mingzhe
Huang Junzhou
Huang Yuanhao
Li Lanqing
Luo Dijun
Luo Siteng
Publication venue
Publication date: 22/02/2021
Field of study

Meta-learning for offline reinforcement learning (OMRL) is an understudied problem with tremendous potential impact by enabling RL algorithms in many real-world applications. A popular solution to the problem is to infer task identity as augmented state using a context-based encoder, for which efficient learning of robust task representations remains an open challenge. In this work, we provably improve upon one of the SOTA OMRL algorithms, FOCAL, by incorporating intra-task attention mechanism and inter-task contrastive learning objectives, to robustify task representation learning against sparse reward and distribution shift. Theoretical analysis and experiments are presented to demonstrate the superior performance and robustness of our end-to-end and model-free framework compared to prior algorithms across multiple meta-RL benchmarks.Comment: 21 pages, 7 figure

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Dynamics of Associative Polymers with High Density of Reversible Bonds

Author: Cai Li-Heng
Chen Quan
Cheng Shiwang
Ge Ting
Kim Myoeum
Nian Shifeng
Patil Shalin
Zhang Siteng
Zhernenkov Mikhail
Publication venue: 'American Physical Society (APS)'
Publication date: 28/06/2022
Field of study

We design and synthesize unentangled associative polymers carrying unprecedented high fractions of stickers, up to eight per Kuhn segment, that can form strong pairwise hydrogen bonding of

\sim20k_BT

without microphase separation. The reversible bonds significantly slow down the polymer dynamics but nearly do not change the shape of linear viscoelastic spectra. Moreover, the structural relaxation time of associative polymers increases exponentially with the fraction of stickers and exhibits a universal yet non-Arrhenius dependence on the distance from polymer glass transition temperature. These results cannot be understood within the framework of the classic sticky-Rouse model but are rationalized by a renormalized Rouse model, which highlights an unexpected influence of reversible bonds on the structural relaxation rather than the shape of viscoelastic spectra for associative polymers with high concentrations of stickers.Comment: 4 figure

arXiv.org e-Print Archive

Artificial intelligence-based non-invasive tumor segmentation, grade stratification and prognosis prediction for clear-cell renal-cell carcinoma

Author: Chen Guihua
Chen Lei
Chen Siteng
Guo Tuanjie
Jiang Beibei
Liu Aie
Pan Xianpan
Song Dandan
Tang Heting
Wang Tao
Wang Xiang
Xue Zhong
Zhang Ning
Zheng Junhua
Publication venue
Publication date: 01/09/2023
Field of study

Due to the complicated histopathological characteristics of clear-cell renal-cell carcinoma (ccRCC), non-invasive prognosis before operative treatment is crucial in selecting the appropriate treatment. A total of 126 345 computerized tomography (CT) images from four independent patient cohorts were included for analysis in this study. We propose a V Bottleneck multi-resolution and focus-organ network (VB-MrFo-Net) using a cascade framework for deep learning analysis. The VB-MrFo-Net achieved better performance than VB-Net in tumor segmentation, with a Dice score of 0.87. The nuclear-grade prediction model performed best in the logistic regression classifier, with area under curve values from 0.782 to 0.746. Survival analysis revealed that our prediction model could significantly distinguish patients with high survival risk, with a hazard ratio (HR) of 2.49 [95% confidence interval (CI): 1.13-5.45, P = 0.023] in the General cohort. Excellent performance had also been verified in the Cancer Genome Atlas cohort, the Clinical Proteomic Tumor Analysis Consortium cohort, and the Kidney Tumor Segmentation Challenge cohort, with HRs of 2.77 (95%CI: 1.58-4.84, P = 0.0019), 3.83 (95%CI: 1.22-11.96, P = 0.029), and 2.80 (95%CI: 1.05-7.47, P = 0.025), respectively. In conclusion, we propose a novel VB-MrFo-Net for the renal tumor segmentation and automatic diagnosis of ccRCC. The risk stratification model could accurately distinguish patients with high tumor grade and high survival risk based on non-invasive CT images before surgical treatments, which could provide practical advice for deciding treatment options.</p

EUR Research Repository

Association between visceral fat area and diabetic retinopathy among people with type 2 diabetes mellitus: a cross-sectional study in Ningbo, Zhejiang Province, China

Author: Bo Li
Dongwei Yao
Li Li
Miao Chen
Shanshan Hua
Siteng Wu
Publication venue: Frontiers Media S.A.
Publication date: 01/02/2024
Field of study

AimThe objective of this study is to investigate the relationship between visceral fat area (VFA) and diabetic retinopathy (DR) in the context of type 2 diabetes mellitus (T2DM) within Ningbo, China.MethodsThe data of a total of 3,707 subjects with T2DM treated at The First Affiliated Hospital of Ningbo University were enrolled. The existence and severity of diabetic retinopathy were assessed by employing the 45° two-field stereoscopic digital photography. Subjects were categorized into four distinct groups: those without DR (NDR), individuals with mild non-proliferative DR (mild NPDR), people with moderate non-proliferative DR (moderate NPDR), and those suffering from vision-threatening DR (VTDR). Bio-electrical impedance was employed to estimate the Visceral fat area (VFA). Multinomial logistic regression models were utilized to evaluate the association between VFA and DR.ResultsThe mean VFA in patients without diabetic retinopathy (NDR) was notably lower compared to that of patients with diabetic retinopathy (DR) (85.21 ± 37.78 vs. 97.37 ± 44.58 cm2, p < 0.001). As the severity of DR increased, VFA increased gradually but insignificantly (94.41 ± 43.13 cm2, 96.75 ± 40.82 cm2, 100.84 ± 49.34 cm2, p = 0.294). After adjusting the confounding factors, there was an association identified between VFA and the occurrence of DR (OR = 1.020, 95% CI = 1.016–1.024). It showed that regardless of BMI, whether it’s less than 25 kg/m2 or greater than or equal to 25 kg/m2, a higher VFA (≥100 cm2) level came with a higher prevalence of DR (p < 0.001).ConclusionThe outcomes of this research indicate a modest association between VFA and the incidence of DR among Chinese patients who have been diagnosed with T2DM in Ningbo

Directory of Open Access Journals

Recommended from our members

Machine Learning Methods for Drug Evaluation and Treatment Assessment

Author: Chen Siteng
Chen Siteng
Publication venue: The University of Arizona.
Publication date: 01/01/2020
Field of study

Drug preclinical test is a key step in evaluating the profile of drug treatment. Many drug tests have been designed for different diseases. For instance, researchers manually count the number of peristaltic waves of drosophila larvae to conduct the severity of amyotrophic lateral sclerosis (ALS). In other cases, pharmacologists have to count dead cells by visual scoring to assess the performance of chemotherapy treatment. Labeling the mitosis events is a time-consuming task, and thus are prohibitive for large scale drug screenings. Machine learning algorithms have allowed researchers to dramatically increase the throughput of analyzing a large amount of data. However, the current methods require massive ground truth annotations which is labor intensive in biomedical experiments. Approaches with few human interventions remain unexplored. This dissertation focuses on three tasks for drug evaluation and treatment assessment. First, we propose a machine learning method to evaluate the effectiveness of drug for ALS. This method leverages t-Distributed Stochastic Neighbor Embedding (tSNE) and statistical analysis to assess the locomotion behavior of drosophila larvae and compare the difference between groups with and without the testing drug. Second, we designed a first-of-the-kind weakly supervised deep neural network for dead cell detection and counting. Compared with many existing fully supervised approaches, our approach only requires image-level ground truth. We show classification performance compared to general purpose and cell classification networks, and report results for the image-level supervised counting task. Last but not least, we propose a sequence-level supervised neural networks model using convolutional long short-term memory (ConvLSTM) and convolutional layers to detect mitosis events at pixel-and-frame level. By using binary labels, the proposed network is able to localize the cell division spatially and temporally. We have evaluated our method with stem cell time-lapse images. With significantly less amount of ground truth in the training data, our method achieved competitive performance compared with the state-of-art fully supervised mitosis detection methods

The University of Arizona

Recommended from our members

Sequence-level Supervised Deep Neural Networks for Mitosis Event Detection in Time-Lapse Microscopy Images

Author: Chen Siteng
Li Ao
Roveda Janet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/12/2020
Field of study

Automatic mitosis detection is a key step in measuring cell proliferation and analyzing the responses to various stimuli. Current deep neural networks can learn complex visual features and capture long-range temporal dependencies. However, the state-of-the-art mitosis detection models require massive ground truth annotations which is labor intensive in biomedical experiments. Therefore, we propose a sequence-level supervised neural networks model to detect mitosis events at pixel-and-frame level. By using binary labels, the proposed network is trained to predict the presence of mitosis for the input microscopy sequences. Then we leverage the feature map produced by the proposed network to localize the cell division. The proposed model achieved a detection F1-score 0.881.With significantly less amount of ground truth in the training data, our method achieved competitive performance compared with the state-of-art fully supervised mitosis detection methods. © 2020 IEEE.National Science FoundationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

The University of Arizona

Clear-Air Turbulence (CAT) Identification with X-Band Dual Polarimetric Radar Based on Bayesian Approach

Author: Jianli Ma
Li Luo
Mingxuan Chen
Siteng Li
Publication venue: MDPI AG
Publication date: 01/12/2021
Field of study

The echo of weather radar is seriously disturbed by clear-air turbulence echo (CAT) which needs identifying and eliminating to improve the data quality of weather radar. Using the data observed with the five X-band dual polarimetric radars in Changping, Fangshan, Miyun, Shunyi, and Tongzhou, Beijing in 2018, the probability density distribution (PDD) of the horizontal texture of four radar moments reflectively factor (ZH), differential reflectivity (ZDR), correlation coefficient (ρHV), differential propagation phase shift (ΦDP), and then the CAT is identified and removed using Bayesian method. The results show that the radar data can be effectively improved after the CAT has been eliminated, which include: (1) the removal rate of CAT is more than 98.2% in the analyzed cases. (2) In the area with high-frequency distribution of CAT, the CAT can be effectively suppressed; in the area with low-frequency distribution, some weather echo in the edge with SNR < 15 dB may be mistakenly identified as CAT, but the proportion of meteorological echoes to the total echoes is more than 85%, which indicate that the error rate is very low and does not affect the radar operation

Directory of Open Access Journals

Self-Attention ConvLSTM for Spatiotemporal Forecasting of Short-Term Online Car-Hailing Demand

Author: Hongxia Ge
Rongjun Cheng
Siteng Li
Zhenlei Chen
Publication venue: MDPI AG
Publication date: 01/06/2022
Field of study

As a flourishing basic transportation service in recent years, online car-hailing has made great achievements in metropolitan cities. Accurate spatiotemporal forecasting plays a significant role in the deployment of a network for online car-hailing demand services. A self-attention mechanism in convolutional long short-term memory (ConvLSTM) is proposed to accurately predict the online car-hailing demand. It can more effectively address the disadvantage that ConvLSTM is not good at capturing spatial correlation over a large spatial extent. Furthermore, it can generate features by aggregating pair-wise similarity scores of features at all positions of input and memory, and thus obtain the function of long-range spatiotemporal dependencies. First, the online car-hailing trajectories dataset was converted into images after geographic grid matching, and image enhancement was performed by cropping. Then, the effectiveness of the ConvLSTM embedded with a self-attention mechanism (SA-ConvLSTM) was demonstrated by comparing it to existing models. The experimental results showed that the proposed model performed better than the existing models, and including spatiotemporal information in images would perform better predictions than including spatial information in time-series pixels

Directory of Open Access Journals