Search CORE

76 research outputs found

Text-Only Training for Visual Storytelling

Author: Li Houqiang
Lu Zhenbo
Wang Yuechen
Zhou Wengang
Publication venue
Publication date: 17/08/2023
Field of study

Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation. Most existing solutions predominantly depend on paired image-text training data, which can be costly to collect and challenging to scale. To address this, we formulate visual storytelling as a visual-conditioned story generation problem and propose a text-only training method that separates the learning of cross-modality alignment and story generation. Our approach specifically leverages the cross-modality pre-trained CLIP model to integrate visual control into a story generator, trained exclusively on text data. Moreover, we devise a training-free visual condition planner that accounts for the temporal structure of the input image sequence while balancing global and local visual content. The distinctive advantage of requiring only text data for training enables our method to learn from external text story data, enhancing the generalization capability of visual storytelling. We conduct extensive experiments on the VIST benchmark, showcasing the effectiveness of our approach in both in-domain and cross-domain settings. Further evaluations on expression diversity and human assessment underscore the superiority of our method in terms of informativeness and robustness.Comment: ACM MM 202

arXiv.org e-Print Archive

Partial entropy in finite-temperature phase transitions

Author: Junpeng Cao
M. N. Barber
Qian Niu
T. Tanaka
Wengang Lu
Xiaoling Cui
Yupeng Wang
Zhang Qi
Publication venue: 'American Physical Society (APS)'
Publication date: 27/11/2007
Field of study

It is shown that the von Neumann entropy, a measure of quantum entanglement, does have its classical counterpart in thermodynamic systems, which we call partial entropy. Close to the critical temperature the partial entropy shows perfect finite-size scaling behavior even for quite small system sizes. This provides a powerful tool to quantify finite-temperature phase transitions as demonstrated on the classical Ising model on a square lattice and the ferromagnetic Heisenberg model on a cubic lattice.Comment: 4 pages, 6 figures, Revised versio

arXiv.org e-Print Archive

Crossref

I $^2$ MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Author: Deng Jiajun
Li Houqiang
Lu Zhenbo
Mao Yunyao
Ouyang Wanli
Zhou Wengang
Publication venue
Publication date: 24/10/2023
Field of study

Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I

^2

MD) framework. In I

^2

MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.Comment: submitted to IJCV. arXiv admin note: substantial text overlap with arXiv:2208.1244

arXiv.org e-Print Archive

UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

Author: Feng Hao
Huang Can
Li Houqiang
Lu Jinghui
Tang Jingqun
Wang Zijian
Zhou Wengang
Publication venue
Publication date: 19/08/2023
Field of study

In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored. In this work, we introduce UniDoc, a novel multimodal model equipped with text detection and recognition capabilities, which are deficient in existing approaches. Moreover, UniDoc capitalizes on the beneficial interactions among tasks to enhance the performance of each individual task. To implement UniDoc, we perform unified multimodal instruct tuning on the contributed large-scale instruction following datasets. Quantitative and qualitative experimental results show that UniDoc sets state-of-the-art scores across multiple challenging benchmarks. To the best of our knowledge, this is the first large multimodal model capable of simultaneous text detection, recognition, spotting, and understanding

arXiv.org e-Print Archive

SinDiffusion: Learning a Diffusion Model from a Single Natural Image

Author: Bao Jianmin
Chen Dong
Chen Dongdong
Li Houqiang
Wang Weilun
Yuan Lu
Zhou Wengang
Publication venue
Publication date: 22/11/2022
Field of study

We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. SinDiffusion significantly improves the quality and diversity of generated samples compared with existing GAN-based approaches. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales which serves as the default setting in prior work. This avoids the accumulation of errors, which cause characteristic artifacts in generated results. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics, therefore we redesign the network structure of the diffusion model. Coupling these two designs enables us to generate photorealistic and diverse images from a single image. Furthermore, SinDiffusion can be applied to various applications, i.e., text-guided image generation, and image outpainting, due to the inherent capability of diffusion models. Extensive experiments on a wide range of images demonstrate the superiority of our proposed method for modeling the patch distribution

arXiv.org e-Print Archive

Research on ride comfort and safety of vehicle under limited conditions based on dynamical tire model

Author: Chen Enli
Lu Yongjie
Zhang Junning
Zheng Wengang
Publication venue: JVE International Ltd.
Publication date: 31/03/2017
Field of study

During driving of a vehicle on road, the tires are undertaking load conversion of the vehicle under various driving conditions and various road conditions within contact patches. As for the contact condition between tire and road, it is often deemed as composed by spring and damping element. The contact with road is always simplified as point contact. Besides, static friction model is adopted, which has ignored physical property of friction and dynamic process of establishment of friction force. It is far from sufficient for current vehicle and road safety design. In this paper, ADAMS software is applied to establish a multi-body dynamics model of heavy vehicle, actual vehicle data was adopted to check virtual sample vehicle, and the Strikbeck dynamical friction property is introduced to tire model during rolling contact between tire and payment, interface of Simulink with ADAMS is applied to put forward a complete vehicle dynamic model truly reflecting the process of dynamic contact between tire and road, and furthermore the correctness and availability of dynamic tire model are verified through comparison with classic Pac2002 tire model. As for dynamic behaviors of heavy vehicle in special sections, finite element method (FEM) is applied to put forward a new 3D complicated road model construction method to construct roads of different classes and long-downhill paths of different S-curves. Simulated analysis of the influence of different speeds, different classes of random roads, different slopes and different adhesion road models on ride comfort of vehicle driving was implemented through utilization of event editor and drive control file, and speed limit standards under different conditions are put forward, so as to provide theoretical basis for road alignment design and reasonable driving speed. Finally, the influence and changing rules of different speeds, different classes of random roads and different slopes on driving safety are discussed from the perspectives of each radial force of tire, alignment torque, sideslip angle and roll angle

Maintenance, Reliability and Condition Monitoring

Research on ride comfort and safety of vehicle under limited conditions based on dynamical tire model

Author: Enli Chen
Junning Zhang
Wengang Zheng
Yongjie Lu
Publication venue: 'JVE International Ltd.'
Publication date: 01/03/2017
Field of study

Journal of Vibroengineering

Crossref

Directory of Open Access Journals

JVE International

Journal of Mechatronics and Artificial Intelligence in Engineering

Cross-Scale Study of the High-Steep Reservoir Banks under Different Mechanical States

Author: Luqi Wang
Wang Lu
Wengang Zhang
Xuecheng Gao
Yulin Zou
Publication venue: 'GeoScienceWorld'
Publication date: 01/03/2022
Field of study

AbstractThe deformation of high-steep rocky banks is caused by the self-weight of overlying rock mass and the fluctuation of reservoir water. In this paper, the newly developed testing equipment and the particle flow code (PFC) were used to complete the cross-scale study of the high-steep rocky banks under different mechanical states. The test conditions involved the dry state, saturated state, and hydraulic coupling states under different confining pressures. Combined with the micrographs of the fractured surface under different mechanical states, it can be found that the participation of the water could reduce the bond contact and accelerate the deformation of the particles, ultimately leading to an increase in the plastic deformation and a decrease in the peak strength of the rock mass. Compared to the saturated state, the water in the hydraulic coupling state was not transferred though the storage space was compressed; thus, the water pressure would further promote the extension of the microcracks. When considering the fluctuations of the reservoir water, the changes in the mechanical state may accelerate the degradation rate of the rock mass. The related methods can provide data support and a theoretical basis to the evolution trend of high-steep rocky reservoir banks

Directory of Open Access Journals