226 research outputs found

    Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models

    Full text link
    Accurate video moment retrieval (VMR) requires universal visual-textual correlations that can handle unknown vocabulary and unseen scenes. However, the learned correlations are likely either biased when derived from a limited amount of moment-text data which is hard to scale up because of the prohibitive annotation cost (fully-supervised), or unreliable when only the video-text pairwise relationships are available without fine-grained temporal annotations (weakly-supervised). Recently, the vision-language models (VLM) demonstrate a new transfer learning paradigm to benefit different vision tasks through the universal visual-textual correlations derived from large-scale vision-language pairwise web data, which has also shown benefits to VMR by fine-tuning in the target domains. In this work, we propose a zero-shot method for adapting generalisable visual-textual priors from arbitrary VLM to facilitate moment-text alignment, without the need for accessing the VMR data. To this end, we devise a conditional feature refinement module to generate boundary-aware visual features conditioned on text queries to enable better moment boundary understanding. Additionally, we design a bottom-up proposal generation strategy that mitigates the impact of domain discrepancies and breaks down complex-query retrieval tasks into individual action retrievals, thereby maximizing the benefits of VLM. Extensive experiments conducted on three VMR benchmark datasets demonstrate the notable performance advantages of our zero-shot algorithm, especially in the novel-word and novel-location out-of-distribution setups.Comment: Accepted by WACV 202

    Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

    Full text link
    The correlation between the vision and text is essential for video moment retrieval (VMR), however, existing methods heavily rely on separate pre-training feature extractors for visual and textual understanding. Without sufficient temporal boundary annotations, it is non-trivial to learn universal video-text alignments. In this work, we explore multi-modal correlations derived from large-scale image-text data to facilitate generalisable VMR. To address the limitations of image-text pre-training models on capturing the video changes, we propose a generic method, referred to as Visual-Dynamic Injection (VDI), to empower the model's understanding of video moments. Whilst existing VMR methods are focusing on building temporal-aware video features, being aware of the text descriptions about the temporal changes is also critical but originally overlooked in pre-training by matching static images with sentences. Therefore, we extract visual context and spatial dynamic information from video frames and explicitly enforce their alignments with the phrases describing video changes (e.g. verb). By doing so, the potentially relevant visual and motion patterns in videos are encoded in the corresponding text embeddings (injected) so to enable more accurate video-text alignments. We conduct extensive experiments on two VMR benchmark datasets (Charades-STA and ActivityNet-Captions) and achieve state-of-the-art performances. Especially, VDI yields notable advantages when being tested on the out-of-distribution splits where the testing samples involve novel scenes and vocabulary.Comment: CVPR202

    Global projection of flood risk with a bivariate framework under 1.5–3.0°C warming levels

    Get PDF
    Global warming increases the atmospheric water-holding capacity, consequently altering the frequency, and intensity of extreme hydrological events. River floods characterized by large peak flow or prolonged duration can amplify the risk of social disruption and affect ecosystem stability. However, previous studies have mostly focused on univariate flood magnitude characteristics, such as flood peak or volume, and there is still limited understanding of how these joint flood characteristics (i.e., magnitude and duration) might co-evolve under different warming levels. Here, we develop a systematical bivariate framework to project future flood risk in 11,528 catchments across the globe. By constructing the joint distribution of flood peak and duration with copulas, we examine global flood risk with a bivariate framework under varying levels of global warming (i.e., within a range of 1.5–3.0°C above pre-industrial levels). The flood projections are produced by driving five calibrated lumped hydrological models (HMs) using the simulations with bias adjustment of five global climate models (GCMs) under three shared socioeconomic pathways (SSP126, SSP370, and SSP585). On average, global warming from 1.5 to 3.0°C tends to amplify flood peak and lengthen flood duration across almost all continents, but changes are not unidirectional and vary regionally around the globe. The joint return period (JRP) of the historical (1985–2014) 50-year flood event is projected to decrease to a median with 36 years under a medium emission pathway at the 1.5°C warming level. Finally, we evaluate the drivers of these JRP changes in the future climate and quantify the uncertainty arising from the different GCMs, SSPs, and HMs. Our findings highlight the importance of limiting greenhouse gas emission to slow down global warming and developing climate adaptation strategies to address future flood hazards

    High-resolution water level and storage variation datasets for 338 reservoirs in China during 2010–2021

    Get PDF
    Reservoirs and dams are essential infrastructure in water management; thus, information of their surface water area (SWA), water surface elevation (WSE), and reservoir water storage change (RWSC) is crucial for understanding their properties and interactions in hydrological and biogeochemical cycles. However, knowledge of these reservoir characteristics is scarce or inconsistent at the national scale. Here, we introduce comprehensive reservoir datasets of 338 reservoirs in China, with a total of 470.6 km3 storage capacity (50 % Chinese reservoir storage capacity). Given the scarcity of publicly available gauged observations and operational applications of satellites for hydrological cycles, we utilize multiple satellite altimetry missions (SARAL/AltiKa, Sentinel-3A and Sentinel-3B, CroySat-2, Jason-3, and ICESat-2) and imagery data from Landsat and Sentinel-2 to produce a comprehensive reservoir dataset on the WSE, SWA, and RWSC during 2010–2021. Validation against gauged measurements of 93 reservoirs demonstrates the relatively high accuracy and reliability of our remotely sensed datasets. (1) Across gauge comparisons of RWSC, the median statistics of the Pearson correlation coefficient (CC), normalized root mean square error (NRMSE), and root mean square error (RMSE) are 0.89, 11 %, and 0.021 km3, with a total of 91 % validated reservoirs (83 of 91) having good RMSE from 0.002 to 0.31 km3 and NRMSE values smaller than 20 %. (2) Comparisons of WSE retracked by six satellite altimeters and gauges show good agreement. Specifically, the percentages of reservoirs having good and moderate RMSE values smaller than 1.0 m for CryoSat-2 (validated in 30 reservoirs), SARAL/AltiKa (9), Sentinel-3A (34), Sentinel-3B (25), Jason-3 (11), and ICESat-2 (26) are 77 %, 75 %, 79 %, 87 %, 81 %, and 82 %, respectively. By taking advantages of six satellite altimeters, we are able to densify WSE observations across spatiotemporal scales. Statistically, around 96 % of validated reservoirs (71 of 74) have RMSE values below 1.0 m, while 57 % of reservoirs (42 of 74) have good data quality with RMSE values below 0.6 m. Overall, our study fills such a data gap with regard to comprehensive reservoir information in China and provides strong support for many aspects such as hydrological processes, water resources, and other studies. The dataset is publicly available on Zenodo at https://doi.org/10.5281/zenodo.7251283 (Shen et al., 2021).</p

    Observation-constrained projection of flood risks and socioeconomic exposure in China

    Get PDF
    As the planet warms, the atmosphere's water vapor holding capacity rises, leading to more intense precipitation extremes. River floods with high peak discharge or long duration can increase the likelihood of infrastructure failure and enhance ecosystem vulnerability. However, changes in the peak and duration of floods and corresponding socioeconomic exposure under climate change are still poorly understood. This study employs a bivariate framework to quantify changes in flood risks and their socioeconomic impacts in China between the past (1985–2014) and future (2071–2100) in 204 catchments. Future daily river streamflow is projected by using a cascade modeling chain based on the outputs of five bias-corrected global climate models (GCMs) under three shared socioeconomic CMIP6 pathways (SSP1-26, SSP3-70, and SSP5-85), a machine learning model and four hydrological models. We also utilize the copula function to build the joint distribution of flood peak and duration, and calculate the joint return periods of the bivariate flood hazard. Finally, the exposure of population and regional gross domestic product to floods are investigated at the national scale. Our results indicate that flood peak and duration are likely to increase in the majority of catchments by 25%–100% by the late 21st century depending on the shared socioeconomic pathway. China is projected to experience a significant increase in bivariate flood risks even under the lowest emission pathway, with 24.0 million dollars/km2 and 608 people/km2 exposed under a moderate emissions scenario (SSP3-70). These findings have direct implications for hazard mitigation and climate adaptation policies in China

    Bioactive conformational generation of small molecules: A comparative analysis between force-field and multiple empirical criteria based methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Conformational sampling for small molecules plays an essential role in drug discovery research pipeline. Based on multi-objective evolution algorithm (MOEA), we have developed a conformational generation method called Cyndi in the previous study. In this work, in addition to Tripos force field in the previous version, Cyndi was updated by incorporation of MMFF94 force field to assess the conformational energy more rationally. With two force fields against a larger dataset of 742 bioactive conformations of small ligands extracted from PDB, a comparative analysis was performed between pure force field based method (FFBM) and multiple empirical criteria based method (MECBM) hybrided with different force fields.</p> <p>Results</p> <p>Our analysis reveals that incorporating multiple empirical rules can significantly improve the accuracy of conformational generation. MECBM, which takes both empirical and force field criteria as the objective functions, can reproduce about 54% (within 1Å RMSD) of the bioactive conformations in the 742-molecule testset, much higher than that of pure force field method (FFBM, about 37%). On the other hand, MECBM achieved a more complete and efficient sampling of the conformational space because the average size of unique conformations ensemble per molecule is about 6 times larger than that of FFBM, while the time scale for conformational generation is nearly the same as FFBM. Furthermore, as a complementary comparison study between the methods with and without empirical biases, we also tested the performance of the three conformational generation methods in MacroModel in combination with different force fields. Compared with the methods in MacroModel, MECBM is more competitive in retrieving the bioactive conformations in light of accuracy but has much lower computational cost.</p> <p>Conclusions</p> <p>By incorporating different energy terms with several empirical criteria, the MECBM method can produce more reasonable conformational ensemble with high accuracy but approximately the same computational cost in comparison with FFBM method. Our analysis also reveals that the performance of conformational generation is irrelevant to the types of force field adopted in characterization of conformational accessibility. Moreover, post energy minimization is not necessary and may even undermine the diversity of conformational ensemble. All the results guide us to explore more empirical criteria like geometric restraints during the conformational process, which may improve the performance of conformational generation in combination with energetic accessibility, regardless of force field types adopted.</p
    corecore