4 research outputs found
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese
The tremendous success of CLIP (Radford et al., 2021) has promoted the
research and application of contrastive learning for vision-language
pretraining. In this work, we construct a large-scale dataset of image-text
pairs in Chinese, where most data are retrieved from publicly available
datasets, and we pretrain Chinese CLIP models on the new dataset. We develop 5
Chinese CLIP models of multiple sizes, spanning from 77 to 958 million
parameters. Furthermore, we propose a two-stage pretraining method, where the
model is first trained with the image encoder frozen and then trained with all
parameters being optimized, to achieve enhanced model performance. Our
comprehensive experiments demonstrate that Chinese CLIP can achieve the
state-of-the-art performance on MUGE, Flickr30K-CN, and COCO-CN in the setups
of zero-shot learning and finetuning, and it is able to achieve competitive
performance in zero-shot image classification based on the evaluation on the
ELEVATER benchmark (Li et al., 2022). We have released our codes, models, and
demos in https://github.com/OFA-Sys/Chinese-CLI
Hierarchical Cross-Modality Semantic Correlation Learning Model for Multimodal Summarization
Multimodal summarization with multimodal output (MSMO) generates a summary with both textual and visual content. Multimodal news report contains heterogeneous contents, which makes MSMO nontrivial. Moreover, it is observed that different modalities of data in the news report correlate hierarchically. Traditional MSMO methods indistinguishably handle different modalities of data by learning a representation for the whole data, which is not directly adaptable to the heterogeneous contents and hierarchical correlation. In this paper, we propose a hierarchical cross-modality semantic correlation learning model (HCSCL) to learn the intra- and inter-modal correlation existing in the multimodal data. HCSCL adopts a graph network to encode the intra-modal correlation. Then, a hierarchical fusion framework is proposed to learn the hierarchical correlation between text and images. Furthermore, we construct a new dataset with relevant image annotation and image object label information to provide the supervision information for the learning procedure. Extensive experiments on the dataset show that HCSCL significantly outperforms the baseline methods in automatic summarization metrics and fine-grained diversity tests
A New Urban Waterlogging Simulation Method Based on Multi-Factor Correlation
Waterlogging simulation is a key technology for solving urban waterlogging problems. The current waterlogging modeling process is relatively complex and requires high basic data, which is not conducive to rapid modeling and popularization. In this study, we evaluated the correlation between rainfall and waterlogging water using the following factors: terrain, evaporation, infiltration, pipe drainage capacity, and river flood water level. By quantifying the influence value of each factor on rainfall, we established a simplified model for fast calculation of waterlogging depth through input rainfall. Waterlogging data was collected from Guangzhou, China to set up the multi-factor correlation model, and verify the simulation results of the model. After the original rainfall is added/deducted, the added/loss value, the relationship between net rainfall, and maximum water depth is better than that between original rainfall and maximum water depth. Establishing a stable multi-factor correlation model for a waterlogging point requires at least three historical waterlogging event data for parameter calibration by sensitivity analysis. Comparing the simulation of four waterlogging points, the multi-factor correlation model (error = −13%) presented the least error in simulating the maximum water volume, followed by the Mike Urban model (error = −19%), and finally the SWMM model (error = 20%). Furthermore, the multi-factor correlation model and SWMM model required the least calculation time (less than 1 s), followed by the Mike Urban model (About half a minute). By analyzing the waterlogging data of Guangzhou, 42 waterlogging points with modeling conditions were screened out to further validate the multi-factor correlation model. Each waterlogging point was modeled based on the historical field, and the last rainstorm was used for model verification. The mean error of the comparison between the simulated maximum waterlogging and the measured maximum waterlogging was 3%, and the R2 value was 0.718. In summary, the multi-factor correlation model requires fewer basic data, has a simple modeling process and wide applicability, and makes it easy to realize the intelligent parameter adjustment, which is more suitable for the urgent requirements of current urban waterlogging prediction. The model results may prove accurate and provide scientific decision support for the prevention and control of urban waterlogging