2,078 research outputs found
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Radiology report generation aims to automatically generate detailed and
coherent descriptive reports alongside radiology images. Previous work mainly
focused on refining fine-grained image features or leveraging external
knowledge. However, the precise alignment of fine-grained image features with
corresponding text descriptions has not been considered. This paper presents a
novel method called Fine-grained Image-Text Aligner (FITA) to construct
fine-grained alignment for image and text features. It has three novel designs:
Image Feature Refiner (IFR), Text Feature Refiner (TFR) and Contrastive Aligner
(CA). IFR and TFR aim to learn fine-grained image and text features,
respectively. We achieve this by leveraging saliency maps to effectively fuse
symptoms with corresponding abnormal visual regions, and by utilizing a
meticulously constructed triplet set for training. Finally, CA module aligns
fine-grained image and text features using contrastive loss for precise
alignment. Results show that our method surpasses existing methods on the
widely used benchmarkComment: 11 pages, 3 figure
Masked and Permuted Implicit Context Learning for Scene Text Recognition
Scene Text Recognition (STR) is difficult because of the variations in text
styles, shapes, and backgrounds. Though the integration of linguistic
information enhances models' performance, existing methods based on either
permuted language modeling (PLM) or masked language modeling (MLM) have their
pitfalls. PLM's autoregressive decoding lacks foresight into subsequent
characters, while MLM overlooks inter-character dependencies. Addressing these
problems, we propose a masked and permuted implicit context learning network
for STR, which unifies PLM and MLM within a single decoder, inheriting the
advantages of both approaches. We utilize the training procedure of PLM, and to
integrate MLM, we incorporate word length information into the decoding process
and replace the undetermined characters with mask tokens. Besides, perturbation
training is employed to train a more robust model against potential length
prediction errors. Our empirical evaluations demonstrate the performance of our
model. It not only achieves superior performance on the common benchmarks but
also achieves a substantial improvement of on the more challenging
Union14M-Benchmark
VidGen-1M: A Large-Scale Dataset for Text-to-video Generation
The quality of video-text pairs fundamentally determines the upper bound of
text-to-video models. Currently, the datasets used for training these models
suffer from significant shortcomings, including low temporal consistency,
poor-quality captions, substandard video quality, and imbalanced data
distribution. The prevailing video curation process, which depends on image
models for tagging and manual rule-based curation, leads to a high
computational load and leaves behind unclean data. As a result, there is a lack
of appropriate training datasets for text-to-video models. To address this
problem, we present VidGen-1M, a superior training dataset for text-to-video
models. Produced through a coarse-to-fine curation strategy, this dataset
guarantees high-quality videos and detailed captions with excellent temporal
consistency. When used to train the video generation model, this dataset has
led to experimental results that surpass those obtained with other models.Comment: project page: https://sais-fuxi.github.io/projects/vidgen-1
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Nowadays, scene text recognition has attracted more and more attention due to
its diverse applications. Most state-of-the-art methods adopt an
encoder-decoder framework with the attention mechanism, autoregressively
generating text from left to right. Despite the convincing performance, this
sequential decoding strategy constrains inference speed. Conversely,
non-autoregressive models provide faster, simultaneous predictions but often
sacrifice accuracy. Although utilizing an explicit language model can improve
performance, it burdens the computational load. Besides, separating linguistic
knowledge from vision information may harm the final prediction. In this paper,
we propose an alternative solution, using a parallel and iterative decoder that
adopts an easy-first decoding strategy. Furthermore, we regard text recognition
as an image-based conditional text generation task and utilize the discrete
diffusion strategy, ensuring exhaustive exploration of bidirectional contextual
information. Extensive experiments demonstrate that the proposed approach
achieves superior results on the benchmark datasets, including both Chinese and
English text images
Incorporating the 10th Edition Institute of Traffic Engineers (ITE) Trip Generation Rates Into Virginia Department of Transportation Guidelines
The Institute of Transportation Engineers (ITE) released the Trip Generation (TG) 10th edition in 2017, which significantly updated its database, and some of its trip generation rates were substantially lower than those of earlier editions. This study aims to investigate the applicability of the TG 10th edition in various Virginia contexts and to recommend how to incorporate the TG 10th edition into state guidelines. The research team surveyed 31 state transportation agencies to obtain a clear understanding of current practices in the adoption of trip rates and trip estimation approaches. We systematically compared trip rates of TG 9th and 10th editions using hypothesis tests and identified land uses with significant rate reduction. Trip generation data were collected from 37 sites in Virginia during weekday PM peaks for the mixed-use sites and single-use sites with significantly reduced 10th edition rates (multi-family low-rise and general office). To investigate the use of trip rates in different settings, general offices in both general urban/suburban and dense multi-use urban were considered. For mixed-use developments, we explored the combinations of four internal trip capture models and TG rates of 9th and 10th editions to identify the best trip estimation approach. Given that all trip data were collected after the outbreak of the COVID-19 pandemic, Streetlight data were used to adjust trip counts to account for the impacts of COVID. This study recommends that VDOT’s Office of Land Use provide guidance to VDOT districts to accept traffic impact analysis reports using ITE’s 10th Edition Trip Generation and the 3rd Edition of the Trip Generation Handbook. It is further recommended that the Office of Land Use provide guidance to the districts to accept traffic impact analysis reports prepared using the methodology presented in the 3rd edition of the Trip Generation Handbook to estimate internal capture for mixed-use developments
Neuronally released vasoactive intestinal polypeptide alters atrial electrophysiological properties and may promote atrial fibrillation
BACKGROUND: Vagal hyperactivity promotes atrial fibrillation (AF), which has been almost exclusively attributed to acetylcholine. Vasoactive intestinal polypeptide (VIP) and acetylcholine are neurotransmitters co-released during vagal stimulation. Exogenous VIP has been shown to promote AF by shortening action potential duration (APD), increasing APD spatial heterogeneity, and causing intra-atrial conduction block.
OBJECTIVE: The purpose of this study was to investigate the effects of neuronally released VIP on atrial electrophysiologic properties during vagal stimulation.
METHODS: We used a specific VIP antagonist (H9935) to uncover the effects of endogenous VIP released during vagal stimulation in canine hearts.
RESULTS: H9935 significantly attenuated (1) the vagally induced shortening of atrial effective refractory period and widening of atrial vulnerability window during stimulation of cervical vagosympathetic trunks (VCNS) and (2) vagal effects on APD during stimulation through fat-pad ganglion plexus (VGPS). Atropine completely abolished these vagal effects during VCNS and VGPS. In contrast, VGPS-induced slowing of local conduction velocity was completely abolished by either VIP antagonist or atropine. In pacing-induced AF during VGPS, maximal dominant frequencies and their spatial gradients were reduced significantly by H9935 and, more pronouncedly, by atropine. Furthermore, VIP release in the atria during vagal stimulation was inhibited by atropine, which may account for the concealment of VIP effects with muscarinic blockade.
CONCLUSION: Neuronally released VIP contributes to vagal effects on atrial electrophysiologic properties and affects the pathophysiology of vagally induced AF. Neuronal release of VIP in the atria is inhibited by muscarinic blockade, a novel mechanism by which VIP effects are concealed by atropine during vagal stimulation
- …
