140 research outputs found

    Masked Vision-Language Transformers for Scene Text Recognition

    Full text link
    Scene text recognition (STR) enables computers to recognize and read the text in various real-world scenes. Recent STR models benefit from taking linguistic information in addition to visual cues into consideration. We propose a novel Masked Vision-Language Transformers (MVLT) to capture both the explicit and the implicit linguistic information. Our encoder is a Vision Transformer, and our decoder is a multi-modal Transformer. MVLT is trained in two stages: in the first stage, we design a STR-tailored pretraining method based on a masking strategy; in the second stage, we fine-tune our model and adopt an iterative correction method to improve the performance. MVLT attains superior results compared to state-of-the-art STR models on several benchmarks. Our code and model are available at https://github.com/onealwj/MVLT.Comment: The paper is accepted by the 33rd British Machine Vision Conference (BMVC 2022

    Design and implementation of an IoT based indoor air quality detector with multiple communication interfaces

    Get PDF
    Indoor air quality monitoring has attracted increasing attention with the rapid development of industrialization and urbanization in the modern society as people typically spend more than 80% of their time in indoor environments. A novel indoor air quality detector (IAQD) integrated with multiple communication interfaces has been designed, built, programmed deployed and tested in order to meet the requirements of wide variety of scenarios. The IAQD measures the indoor air quality data, including temperature, humidity, CO2, dust and formaldehyde timely. With state-of-the-art Internet of Things (IoT) technologies, the IAQD is integrated with Modbus, LoRa, WiFi, GPRS and NB-IoT communication interfaces, which enables to be applied to wired communications, short-range wireless communications, and remote transmission to the cloud. The designed software in cloud allows users to track the indoor air quality of their home or office or industries everywhere. The performance IAQD is evaluated in terms of packet loss rate and time delay. The evaluation of IAQD are demonstrated and analyzed within the office environment over a week. Experimental results show that the proposed system is effectiveness in measuring the air-quality status and provide excellent consistency and stability

    Genetic Dissection of Disease Resistance to the Blue Mold Pathogen, \u3cem\u3ePeronospora tabacina\u3c/em\u3e, in Tobacco

    Get PDF
    Tobacco blue mold, caused by the obligately biotrophic oomycete pathogen Peronospora tabacina D.B. Adam, is a major foliar disease that results in significant losses in tobacco-growing areas. Natural resistance to P. tabacina has not been identified in any variety of common tobacco. Complete resistance, conferred by RBM1, was found in N. debneyi and was transferred into cultivated tobacco by crossing. In the present study, we characterized the RBM1-mediated resistance to blue mold in tobacco and show that the hypersensitive response (HR) plays an important role in the host defense reactions. Genetic mapping indicated that the disease resistance gene locus resides on chromosome 7. The genetic markers linked to this gene and the genetic map we generated will not only benefit tobacco breeders for variety improvement but will also facilitate the positional cloning of RBM1 for biologists

    DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

    Full text link
    Controllable video generation has gained significant attention in recent years. However, two main limitations persist: Firstly, most existing works focus on either text, image, or trajectory-based control, leading to an inability to achieve fine-grained control in videos. Secondly, trajectory control research is still in its early stages, with most experiments being conducted on simple datasets like Human3.6M. This constraint limits the models' capability to process open-domain images and effectively handle complex curved trajectories. In this paper, we propose DragNUWA, an open-domain diffusion-based video generation model. To tackle the issue of insufficient control granularity in existing works, we simultaneously introduce text, image, and trajectory information to provide fine-grained control over video content from semantic, spatial, and temporal perspectives. To resolve the problem of limited open-domain trajectory control in current research, We propose trajectory modeling with three aspects: a Trajectory Sampler (TS) to enable open-domain control of arbitrary trajectories, a Multiscale Fusion (MF) to control trajectories in different granularities, and an Adaptive Training (AT) strategy to generate consistent videos following trajectories. Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation. The homepage link is \url{https://www.microsoft.com/en-us/research/project/dragnuwa/

    Using Left and Right Brains Together: Towards Vision and Language Planning

    Full text link
    Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking process. Therefore, we introduce a novel vision-language planning framework in this work to perform concurrent visual and language planning for tasks with inputs of any form. Our framework incorporates visual planning to capture intricate environmental details, while language planning enhances the logical coherence of the overall system. We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks. The results demonstrate the superior performance of our approach, indicating that the integration of visual and language planning yields better contextually aware task execution.Comment: 19 pages, 13 figure

    Meta-Analysis of the Correlation between Apparent Diffusion Coefficient and Standardized Uptake Value in Malignant Disease

    Get PDF
    The objective of this meta-analysis is to explore the correlation between the apparent diffusion coefficient (ADC) on diffusion-weighted MR and the standard uptake value (SUV) of 18F-FDG on PET/CT in patients with cancer. Databases such as PubMed (MEDLINE included), EMBASE, and Cochrane Database of Systematic Review were searched for relevant original articles that explored the correlation between SUV and ADC in English. After applying Fisher’s r-to-z transformation, correlation coefficient (r) values were extracted from each study and 95% confidence intervals (CIs) were calculated. Sensitivity and subgroup analyses based on tumor type were performed to investigate the potential heterogeneity. Forty-nine studies were eligible for the meta-analysis, comprising 1927 patients. Pooled r for all studies was −0.35 (95% CI: −0.42–0.28) and exhibited a notable heterogeneity (I2 = 78.4%; P < 0.01). In terms of the cancer type subgroup analysis, combined correlation coefficients of ADC/SUV range from −0.12 (lymphoma, n = 5) to −0.59 (pancreatic cancer, n = 2). We concluded that there is an average negative correlation between ADC and SUV in patients with cancer. Higher correlations were found in the brain tumor, cervix carcinoma, and pancreas cancer. However, a larger, prospective study is warranted to validate these findings in different cancer types
    • …
    corecore