2,078 research outputs found

    FITA: Fine-grained Image-Text Aligner for Radiology Report Generation

    Full text link
    Radiology report generation aims to automatically generate detailed and coherent descriptive reports alongside radiology images. Previous work mainly focused on refining fine-grained image features or leveraging external knowledge. However, the precise alignment of fine-grained image features with corresponding text descriptions has not been considered. This paper presents a novel method called Fine-grained Image-Text Aligner (FITA) to construct fine-grained alignment for image and text features. It has three novel designs: Image Feature Refiner (IFR), Text Feature Refiner (TFR) and Contrastive Aligner (CA). IFR and TFR aim to learn fine-grained image and text features, respectively. We achieve this by leveraging saliency maps to effectively fuse symptoms with corresponding abnormal visual regions, and by utilizing a meticulously constructed triplet set for training. Finally, CA module aligns fine-grained image and text features using contrastive loss for precise alignment. Results show that our method surpasses existing methods on the widely used benchmarkComment: 11 pages, 3 figure

    Masked and Permuted Implicit Context Learning for Scene Text Recognition

    Full text link
    Scene Text Recognition (STR) is difficult because of the variations in text styles, shapes, and backgrounds. Though the integration of linguistic information enhances models' performance, existing methods based on either permuted language modeling (PLM) or masked language modeling (MLM) have their pitfalls. PLM's autoregressive decoding lacks foresight into subsequent characters, while MLM overlooks inter-character dependencies. Addressing these problems, we propose a masked and permuted implicit context learning network for STR, which unifies PLM and MLM within a single decoder, inheriting the advantages of both approaches. We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process and replace the undetermined characters with mask tokens. Besides, perturbation training is employed to train a more robust model against potential length prediction errors. Our empirical evaluations demonstrate the performance of our model. It not only achieves superior performance on the common benchmarks but also achieves a substantial improvement of 9.1%9.1\% on the more challenging Union14M-Benchmark

    VidGen-1M: A Large-Scale Dataset for Text-to-video Generation

    Full text link
    The quality of video-text pairs fundamentally determines the upper bound of text-to-video models. Currently, the datasets used for training these models suffer from significant shortcomings, including low temporal consistency, poor-quality captions, substandard video quality, and imbalanced data distribution. The prevailing video curation process, which depends on image models for tagging and manual rule-based curation, leads to a high computational load and leaves behind unclean data. As a result, there is a lack of appropriate training datasets for text-to-video models. To address this problem, we present VidGen-1M, a superior training dataset for text-to-video models. Produced through a coarse-to-fine curation strategy, this dataset guarantees high-quality videos and detailed captions with excellent temporal consistency. When used to train the video generation model, this dataset has led to experimental results that surpass those obtained with other models.Comment: project page: https://sais-fuxi.github.io/projects/vidgen-1

    IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition

    Full text link
    Nowadays, scene text recognition has attracted more and more attention due to its diverse applications. Most state-of-the-art methods adopt an encoder-decoder framework with the attention mechanism, autoregressively generating text from left to right. Despite the convincing performance, this sequential decoding strategy constrains inference speed. Conversely, non-autoregressive models provide faster, simultaneous predictions but often sacrifice accuracy. Although utilizing an explicit language model can improve performance, it burdens the computational load. Besides, separating linguistic knowledge from vision information may harm the final prediction. In this paper, we propose an alternative solution, using a parallel and iterative decoder that adopts an easy-first decoding strategy. Furthermore, we regard text recognition as an image-based conditional text generation task and utilize the discrete diffusion strategy, ensuring exhaustive exploration of bidirectional contextual information. Extensive experiments demonstrate that the proposed approach achieves superior results on the benchmark datasets, including both Chinese and English text images

    Incorporating the 10th Edition Institute of Traffic Engineers (ITE) Trip Generation Rates Into Virginia Department of Transportation Guidelines

    Get PDF
    The Institute of Transportation Engineers (ITE) released the Trip Generation (TG) 10th edition in 2017, which significantly updated its database, and some of its trip generation rates were substantially lower than those of earlier editions. This study aims to investigate the applicability of the TG 10th edition in various Virginia contexts and to recommend how to incorporate the TG 10th edition into state guidelines. The research team surveyed 31 state transportation agencies to obtain a clear understanding of current practices in the adoption of trip rates and trip estimation approaches. We systematically compared trip rates of TG 9th and 10th editions using hypothesis tests and identified land uses with significant rate reduction. Trip generation data were collected from 37 sites in Virginia during weekday PM peaks for the mixed-use sites and single-use sites with significantly reduced 10th edition rates (multi-family low-rise and general office). To investigate the use of trip rates in different settings, general offices in both general urban/suburban and dense multi-use urban were considered. For mixed-use developments, we explored the combinations of four internal trip capture models and TG rates of 9th and 10th editions to identify the best trip estimation approach. Given that all trip data were collected after the outbreak of the COVID-19 pandemic, Streetlight data were used to adjust trip counts to account for the impacts of COVID. This study recommends that VDOT’s Office of Land Use provide guidance to VDOT districts to accept traffic impact analysis reports using ITE’s 10th Edition Trip Generation and the 3rd Edition of the Trip Generation Handbook. It is further recommended that the Office of Land Use provide guidance to the districts to accept traffic impact analysis reports prepared using the methodology presented in the 3rd edition of the Trip Generation Handbook to estimate internal capture for mixed-use developments

    Neuronally released vasoactive intestinal polypeptide alters atrial electrophysiological properties and may promote atrial fibrillation

    Get PDF
    BACKGROUND: Vagal hyperactivity promotes atrial fibrillation (AF), which has been almost exclusively attributed to acetylcholine. Vasoactive intestinal polypeptide (VIP) and acetylcholine are neurotransmitters co-released during vagal stimulation. Exogenous VIP has been shown to promote AF by shortening action potential duration (APD), increasing APD spatial heterogeneity, and causing intra-atrial conduction block. OBJECTIVE: The purpose of this study was to investigate the effects of neuronally released VIP on atrial electrophysiologic properties during vagal stimulation. METHODS: We used a specific VIP antagonist (H9935) to uncover the effects of endogenous VIP released during vagal stimulation in canine hearts. RESULTS: H9935 significantly attenuated (1) the vagally induced shortening of atrial effective refractory period and widening of atrial vulnerability window during stimulation of cervical vagosympathetic trunks (VCNS) and (2) vagal effects on APD during stimulation through fat-pad ganglion plexus (VGPS). Atropine completely abolished these vagal effects during VCNS and VGPS. In contrast, VGPS-induced slowing of local conduction velocity was completely abolished by either VIP antagonist or atropine. In pacing-induced AF during VGPS, maximal dominant frequencies and their spatial gradients were reduced significantly by H9935 and, more pronouncedly, by atropine. Furthermore, VIP release in the atria during vagal stimulation was inhibited by atropine, which may account for the concealment of VIP effects with muscarinic blockade. CONCLUSION: Neuronally released VIP contributes to vagal effects on atrial electrophysiologic properties and affects the pathophysiology of vagally induced AF. Neuronal release of VIP in the atria is inhibited by muscarinic blockade, a novel mechanism by which VIP effects are concealed by atropine during vagal stimulation
    corecore