37 research outputs found
VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders
Large-scale text-to-image diffusion models have shown impressive capabilities
across various generative tasks, enabled by strong vision-language alignment
obtained through pre-training. However, most vision-language discriminative
tasks require extensive fine-tuning on carefully-labeled datasets to acquire
such alignment, with great cost in time and computing resources. In this work,
we explore directly applying a pre-trained generative diffusion model to the
challenging discriminative task of visual grounding without any fine-tuning and
additional training dataset. Specifically, we propose VGDiffZero, a simple yet
effective zero-shot visual grounding framework based on text-to-image diffusion
models. We also design a comprehensive region-scoring method considering both
global and local contexts of each isolated proposal. Extensive experiments on
RefCOCO, RefCOCO+, and RefCOCOg show that VGDiffZero achieves strong
performance on zero-shot visual grounding
Provably Improved Context-Based Offline Meta-RL with Attention and Contrastive Learning
Meta-learning for offline reinforcement learning (OMRL) is an understudied
problem with tremendous potential impact by enabling RL algorithms in many
real-world applications. A popular solution to the problem is to infer task
identity as augmented state using a context-based encoder, for which efficient
learning of robust task representations remains an open challenge. In this
work, we provably improve upon one of the SOTA OMRL algorithms, FOCAL, by
incorporating intra-task attention mechanism and inter-task contrastive
learning objectives, to robustify task representation learning against sparse
reward and distribution shift. Theoretical analysis and experiments are
presented to demonstrate the superior performance and robustness of our
end-to-end and model-free framework compared to prior algorithms across
multiple meta-RL benchmarks.Comment: 21 pages, 7 figure
Dynamics of Associative Polymers with High Density of Reversible Bonds
We design and synthesize unentangled associative polymers carrying
unprecedented high fractions of stickers, up to eight per Kuhn segment, that
can form strong pairwise hydrogen bonding of without microphase
separation. The reversible bonds significantly slow down the polymer dynamics
but nearly do not change the shape of linear viscoelastic spectra. Moreover,
the structural relaxation time of associative polymers increases exponentially
with the fraction of stickers and exhibits a universal yet non-Arrhenius
dependence on the distance from polymer glass transition temperature. These
results cannot be understood within the framework of the classic sticky-Rouse
model but are rationalized by a renormalized Rouse model, which highlights an
unexpected influence of reversible bonds on the structural relaxation rather
than the shape of viscoelastic spectra for associative polymers with high
concentrations of stickers.Comment: 4 figure
Artificial intelligence-based non-invasive tumor segmentation, grade stratification and prognosis prediction for clear-cell renal-cell carcinoma
Due to the complicated histopathological characteristics of clear-cell renal-cell carcinoma (ccRCC), non-invasive prognosis before operative treatment is crucial in selecting the appropriate treatment. A total of 126 345 computerized tomography (CT) images from four independent patient cohorts were included for analysis in this study. We propose a V Bottleneck multi-resolution and focus-organ network (VB-MrFo-Net) using a cascade framework for deep learning analysis. The VB-MrFo-Net achieved better performance than VB-Net in tumor segmentation, with a Dice score of 0.87. The nuclear-grade prediction model performed best in the logistic regression classifier, with area under curve values from 0.782 to 0.746. Survival analysis revealed that our prediction model could significantly distinguish patients with high survival risk, with a hazard ratio (HR) of 2.49 [95% confidence interval (CI): 1.13-5.45, P = 0.023] in the General cohort. Excellent performance had also been verified in the Cancer Genome Atlas cohort, the Clinical Proteomic Tumor Analysis Consortium cohort, and the Kidney Tumor Segmentation Challenge cohort, with HRs of 2.77 (95%CI: 1.58-4.84, P = 0.0019), 3.83 (95%CI: 1.22-11.96, P = 0.029), and 2.80 (95%CI: 1.05-7.47, P = 0.025), respectively. In conclusion, we propose a novel VB-MrFo-Net for the renal tumor segmentation and automatic diagnosis of ccRCC. The risk stratification model could accurately distinguish patients with high tumor grade and high survival risk based on non-invasive CT images before surgical treatments, which could provide practical advice for deciding treatment options.</p
Association between visceral fat area and diabetic retinopathy among people with type 2 diabetes mellitus: a cross-sectional study in Ningbo, Zhejiang Province, China
AimThe objective of this study is to investigate the relationship between visceral fat area (VFA) and diabetic retinopathy (DR) in the context of type 2 diabetes mellitus (T2DM) within Ningbo, China.MethodsThe data of a total of 3,707 subjects with T2DM treated at The First Affiliated Hospital of Ningbo University were enrolled. The existence and severity of diabetic retinopathy were assessed by employing the 45° two-field stereoscopic digital photography. Subjects were categorized into four distinct groups: those without DR (NDR), individuals with mild non-proliferative DR (mild NPDR), people with moderate non-proliferative DR (moderate NPDR), and those suffering from vision-threatening DR (VTDR). Bio-electrical impedance was employed to estimate the Visceral fat area (VFA). Multinomial logistic regression models were utilized to evaluate the association between VFA and DR.ResultsThe mean VFA in patients without diabetic retinopathy (NDR) was notably lower compared to that of patients with diabetic retinopathy (DR) (85.21 ± 37.78 vs. 97.37 ± 44.58 cm2, p < 0.001). As the severity of DR increased, VFA increased gradually but insignificantly (94.41 ± 43.13 cm2, 96.75 ± 40.82 cm2, 100.84 ± 49.34 cm2, p = 0.294). After adjusting the confounding factors, there was an association identified between VFA and the occurrence of DR (OR = 1.020, 95% CI = 1.016–1.024). It showed that regardless of BMI, whether it’s less than 25 kg/m2 or greater than or equal to 25 kg/m2, a higher VFA (≥100 cm2) level came with a higher prevalence of DR (p < 0.001).ConclusionThe outcomes of this research indicate a modest association between VFA and the incidence of DR among Chinese patients who have been diagnosed with T2DM in Ningbo
Recommended from our members
Machine Learning Methods for Drug Evaluation and Treatment Assessment
Drug preclinical test is a key step in evaluating the profile of drug treatment. Many drug tests have been designed for different diseases. For instance, researchers manually count the number of peristaltic waves of drosophila larvae to conduct the severity of amyotrophic lateral sclerosis (ALS). In other cases, pharmacologists have to count dead cells by visual scoring to assess the performance of chemotherapy treatment. Labeling the mitosis events is a time-consuming task, and thus are prohibitive for large scale drug screenings. Machine learning algorithms have allowed researchers to dramatically increase the throughput of analyzing a large amount of data. However, the current methods require massive ground truth annotations which is labor intensive in biomedical experiments. Approaches with few human interventions remain unexplored. This dissertation focuses on three tasks for drug evaluation and treatment assessment. First, we propose a machine learning method to evaluate the effectiveness of drug for ALS. This method leverages t-Distributed Stochastic Neighbor Embedding (tSNE) and statistical analysis to assess the locomotion behavior of drosophila larvae and compare the difference between groups with and without the testing drug. Second, we designed a first-of-the-kind weakly supervised deep neural network for dead cell detection and counting. Compared with many existing fully supervised approaches, our approach only requires image-level ground truth. We show classification performance compared to general purpose and cell classification networks, and report results for the image-level supervised counting task. Last but not least, we propose a sequence-level supervised neural networks model using convolutional long short-term memory (ConvLSTM) and convolutional layers to detect mitosis events at pixel-and-frame level. By using binary labels, the proposed network is able to localize the cell division spatially and temporally. We have evaluated our method with stem cell time-lapse images. With significantly less amount of ground truth in the training data, our method achieved competitive performance compared with the state-of-art fully supervised mitosis detection methods
Recommended from our members
Sequence-level Supervised Deep Neural Networks for Mitosis Event Detection in Time-Lapse Microscopy Images
Automatic mitosis detection is a key step in measuring cell proliferation and analyzing the responses to various stimuli. Current deep neural networks can learn complex visual features and capture long-range temporal dependencies. However, the state-of-the-art mitosis detection models require massive ground truth annotations which is labor intensive in biomedical experiments. Therefore, we propose a sequence-level supervised neural networks model to detect mitosis events at pixel-and-frame level. By using binary labels, the proposed network is trained to predict the presence of mitosis for the input microscopy sequences. Then we leverage the feature map produced by the proposed network to localize the cell division. The proposed model achieved a detection F1-score 0.881.With significantly less amount of ground truth in the training data, our method achieved competitive performance compared with the state-of-art fully supervised mitosis detection methods. © 2020 IEEE.National Science FoundationThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
Clear-Air Turbulence (CAT) Identification with X-Band Dual Polarimetric Radar Based on Bayesian Approach
The echo of weather radar is seriously disturbed by clear-air turbulence echo (CAT) which needs identifying and eliminating to improve the data quality of weather radar. Using the data observed with the five X-band dual polarimetric radars in Changping, Fangshan, Miyun, Shunyi, and Tongzhou, Beijing in 2018, the probability density distribution (PDD) of the horizontal texture of four radar moments reflectively factor (ZH), differential reflectivity (ZDR), correlation coefficient (ρHV), differential propagation phase shift (ΦDP), and then the CAT is identified and removed using Bayesian method. The results show that the radar data can be effectively improved after the CAT has been eliminated, which include: (1) the removal rate of CAT is more than 98.2% in the analyzed cases. (2) In the area with high-frequency distribution of CAT, the CAT can be effectively suppressed; in the area with low-frequency distribution, some weather echo in the edge with SNR < 15 dB may be mistakenly identified as CAT, but the proportion of meteorological echoes to the total echoes is more than 85%, which indicate that the error rate is very low and does not affect the radar operation
Self-Attention ConvLSTM for Spatiotemporal Forecasting of Short-Term Online Car-Hailing Demand
As a flourishing basic transportation service in recent years, online car-hailing has made great achievements in metropolitan cities. Accurate spatiotemporal forecasting plays a significant role in the deployment of a network for online car-hailing demand services. A self-attention mechanism in convolutional long short-term memory (ConvLSTM) is proposed to accurately predict the online car-hailing demand. It can more effectively address the disadvantage that ConvLSTM is not good at capturing spatial correlation over a large spatial extent. Furthermore, it can generate features by aggregating pair-wise similarity scores of features at all positions of input and memory, and thus obtain the function of long-range spatiotemporal dependencies. First, the online car-hailing trajectories dataset was converted into images after geographic grid matching, and image enhancement was performed by cropping. Then, the effectiveness of the ConvLSTM embedded with a self-attention mechanism (SA-ConvLSTM) was demonstrated by comparing it to existing models. The experimental results showed that the proposed model performed better than the existing models, and including spatiotemporal information in images would perform better predictions than including spatial information in time-series pixels