67 research outputs found

    GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

    Full text link
    Video generation necessitates both global coherence and local realism. This work presents a novel non-autoregressive method GLOBER, which first generates global features to obtain comprehensive global guidance and then synthesizes video frames based on the global features to generate coherent videos. Specifically, we propose a video auto-encoder, where a video encoder encodes videos into global features, and a video decoder, built on a diffusion model, decodes the global features and synthesizes video frames in a non-autoregressive manner. To achieve maximum flexibility, our video decoder perceives temporal information through normalized frame indexes, which enables it to synthesize arbitrary sub video clips with predetermined starting and ending frame indexes. Moreover, a novel adversarial loss is introduced to improve the global coherence and local realism between the synthesized video frames. Finally, we employ a diffusion-based video generator to fit the global features outputted by the video encoder for video generation. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed method, and new state-of-the-art results have been achieved on multiple benchmarks

    VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

    Full text link
    Vision and text have been fully explored in contemporary video-text foundational models, while other modalities such as audio and subtitles in videos have not received sufficient attention. In this paper, we resort to establish connections between multi-modality video tracks, including Vision, Audio, and Subtitle, and Text by exploring an automatically generated large-scale omni-modality video caption dataset called VAST-27M. Specifically, we first collect 27 million open-domain video clips and separately train a vision and an audio captioner to generate vision and audio captions. Then, we employ an off-the-shelf Large Language Model (LLM) to integrate the generated captions, together with subtitles and instructional prompts into omni-modality captions. Based on the proposed VAST-27M dataset, we train an omni-modality video-text foundational model named VAST, which can perceive and process vision, audio, and subtitle modalities from video, and better support various tasks including vision-text, audio-text, and multi-modal video-text tasks (retrieval, captioning and QA). Extensive experiments have been conducted to demonstrate the effectiveness of our proposed VAST-27M corpus and VAST foundation model. VAST achieves 22 new state-of-the-art results on various cross-modality benchmarks. Code, model and dataset will be released at https://github.com/TXH-mercury/VAST.Comment: 23 pages, 5 figure

    Hyperspectral band selection using crossover based gravitational search algorithm

    Get PDF
    Band selection is an important data dimensionality reduction tool in hyperspectral images (HSIs). To identify the most informative subset band from the hundreds of highly corrected bands in HSIs, a novel hyperspectral band selection method using a crossover based gravitational search algorithm (CGSA) is presented in this paper. In this method, the discriminative capability of each band subset is evaluated by a combined optimization criterion, which is constructed based on the overall classification accuracy and the size of the band subset. As the evolution of the criterion, the subset is updated using the V-shaped transfer function based CGSA. Ultimately, the band subset with the best fitness value is selected. Experiments on two public hyperspectral datasets, i.e. the Indian Pines dataset and the Pavia University dataset, have been conducted to test the performance of the proposed method. Comparing experimental results against the basic GSA and the PSOGSA (hybrid PSO and GSA) revealed that all of the three GSA variants can considerably reduce the band dimensionality of HSIs without damaging their classification accuracy. Moreover, the CGSA shows superiority on both the effectiveness and efficiency compared to the other two GSA variants

    Robust estimation of bacterial cell count from optical density

    Get PDF
    Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data

    Case Study of Vegetative Treatment System Performance

    No full text
    Vegetative treatment systems (VTS) are analyzed as a possible alternative to holding ponds to manage runoff from cattle feedlots. This study was conducted to validate a VTS model with the previously measured data from a Minnehaha County VTS Site, South Dakota. The performance of VTS was evaluated by determining water inflow, outflow, precipitation and mass inflow, outflow on vegetative treatment areas (VTA). The balances were determined by using all VTA inflow, outflow, nutrient concentrations of the inflow water, and the soil nutrient concentrations within the VTA. Seven years have been completed in simulating VTS data for the Minnehaha County site in South Dakota. There were two years of measured data for the Minnehaha County site to use in comparison to the simulated data. There were no simulated surface water releases from the VTA during the simulation period from 2007 to 2013. The VTS was fully successful in preventing surface water releases. The simulated nitrate concentration, total Kjeldahl nitrogen concentration, and total phosphorus concentration in the water for the Years 2009 and 2010 were compared with measured data. All demonstrated close similarity to the measured data, indicating that the model accurately simulated the performance of the VTS. All of these results indicate that VTS has potential to become a treatment system for the containment of both water and nutrients in feedlot runoff

    Exploring the Interface to Aid the Operator’s Situation Awareness in Supervisory Control of Multiple Drones

    No full text
    Unmanned Aerial Vehicles (UAVs), commonly called drones, have been applied in manifold fields recently. With the development of UAV autonomy, the next generation of drone applications is moving towards team-based, multi-drone operations. This also promotes the transition of the operator role to the supervisory control of multiple UAVs. Situation awareness (SA) is a significant concept in this aspect to evaluate human performance in complex systems. This thesis work proposes a human-system interface for monitoring multiple autonomous UAVs simultaneously by a single operator and investigates how to decrease the impact of task switching among different UAVs on the operator’s SA. Tasks in the context of fleet mission control are defined to be of different levels of urgency. Several design strategies have been concluded to address the research question. In conclusion, the usage of similar interface layouts between different tasks is effective to generally decrease the impact of task switching. The alert system with appropriate design is a specific factor in mitigating the impact of task switching towards higher urgency tasks/interfaces. Moreover, the reasonable division of areas of the interface and proper presentation of information by their importance are significant, especially for task switching towards lower urgency tasks/interfaces

    Utforska grÀnssnittet för att hjÀlpa operatörens situationsmedvetenhet vid övervakningskontroll av flera drönare

    No full text
    Unmanned Aerial Vehicles (UAVs), commonly called drones, have been applied in manifold fields recently. With the development of UAV autonomy, the next generation of drone applications is moving towards team-based, multi-drone operations. This also promotes the transition of the operator role to the supervisory control of multiple UAVs. Situation awareness (SA) is a significant concept in this aspect to evaluate human performance in complex systems. This thesis work proposes a human-system interface for monitoring multiple autonomous UAVs simultaneously by a single operator, and investigates how to decrease the impact of task switching among different UAVs on the operator’s SA. Tasks in the context of fleet mission control are defined to be of different levels of urgency. Several design strategies have been concluded to address the research question. In conclusion, the usage of similar interface layouts between different tasks is effective to generally decrease the impact of task switching. The alert system with appropriate design is a specific factor in mitigating the impact of task switching towards higher urgency tasks/interfaces. Moreover, the reasonable division of areas of the interface and proper presentation of information by their importance are significant, especially for task switching towards lower urgency tasks/interfaces.Unmanned Aerial Vehicles (UAV), vanligtvis kallade drönare, har anvĂ€nts i mĂ„nga omrĂ„den nyligen. Med utvecklingen av UAV-autonomi, gĂ„r nĂ€sta generation av drönarapplikationer mot teambaserad, multi-drönarverksamhet. Detta frĂ€mjar ocksĂ„ övergĂ„ngen av operatörsrollen till övervakande kontroll av flera UAV. Situationsmedvetenhet (SA) Ă€r ett betydelsefullt koncept i denna aspekt för att utvĂ€rdera mĂ€nsklig prestation i komplexa system. Detta examensarbete föreslĂ„r ett grĂ€nssnitt mellan mĂ€nniska och system för att övervaka flera autonoma UAV:er samtidigt av en enda operatör, och undersöker hur man kan minska effekten av uppgiftsbyte mellan olika UAV:er pĂ„ operatörens SA. Uppgifter i samband med kontroll av flottans uppdrag definieras till att vara av olika brĂ„dskande nivĂ„. Flera designstrategier har tagits fram för att ta itu med forskningsfrĂ„gan. Sammanfattningsvis Ă€r anvĂ€ndningen av liknande grĂ€nssnittslayouter mellan olika uppgifter effektivt för att generellt minska effekten av uppgiftsbyte. Varningssystemet med lĂ€mplig design Ă€r en specifik faktor för att mildra effekterna av uppgiftsbyte mot mer brĂ„dskande uppgifter/grĂ€nssnitt. Dessutom Ă€r den rimliga uppdelningen av omrĂ„den i grĂ€nssnittet och korrekt presentation av information efter deras betydelse betydande, sĂ€rskilt för uppgiftsbyte mot mindre brĂ„dskande uppgifter/grĂ€nssnitt

    Scene Changes Understanding Framework Based on Graph Convolutional Networks and Swin Transformer Blocks for Monitoring LCLU Using High-Resolution Remote Sensing Images

    No full text
    High-resolution remote sensing images with rich land surface structure can provide data support for accurately understanding more detailed change information of land cover and land use (LCLU) at different times. In this study, we present a novel scene change understanding framework for remote sensing which includes scene classification and change detection. To enhance the feature representation of images in scene classification, a robust label semantic relation learning (LSRL) network based on EfficientNet is presented for scene classification. It consists of a semantic relation learning module based on graph convolutional networks and a joint expression learning framework based on similarity. Since the bi-temporal remote sensing image pairs include spectral information in both temporal and spatial dimensions, land cover and land use change monitoring can be improved by using the relationship between different spatial and temporal locations. Therefore, a change detection method based on swin transformer blocks (STB-CD) is presented to obtain contextual relationships between targets. The experimental results on the LEVIR-CD, NWPU-RESISC45, and AID datasets demonstrate the superiority of LSRL and STB-CD over other state-of-the-art methods

    Scene Changes Understanding Framework Based on Graph Convolutional Networks and Swin Transformer Blocks for Monitoring LCLU Using High-Resolution Remote Sensing Images

    No full text
    High-resolution remote sensing images with rich land surface structure can provide data support for accurately understanding more detailed change information of land cover and land use (LCLU) at different times. In this study, we present a novel scene change understanding framework for remote sensing which includes scene classification and change detection. To enhance the feature representation of images in scene classification, a robust label semantic relation learning (LSRL) network based on EfficientNet is presented for scene classification. It consists of a semantic relation learning module based on graph convolutional networks and a joint expression learning framework based on similarity. Since the bi-temporal remote sensing image pairs include spectral information in both temporal and spatial dimensions, land cover and land use change monitoring can be improved by using the relationship between different spatial and temporal locations. Therefore, a change detection method based on swin transformer blocks (STB-CD) is presented to obtain contextual relationships between targets. The experimental results on the LEVIR-CD, NWPU-RESISC45, and AID datasets demonstrate the superiority of LSRL and STB-CD over other state-of-the-art methods
    • 

    corecore