76 research outputs found
Text-Only Training for Visual Storytelling
Visual storytelling aims to generate a narrative based on a sequence of
images, necessitating both vision-language alignment and coherent story
generation. Most existing solutions predominantly depend on paired image-text
training data, which can be costly to collect and challenging to scale. To
address this, we formulate visual storytelling as a visual-conditioned story
generation problem and propose a text-only training method that separates the
learning of cross-modality alignment and story generation. Our approach
specifically leverages the cross-modality pre-trained CLIP model to integrate
visual control into a story generator, trained exclusively on text data.
Moreover, we devise a training-free visual condition planner that accounts for
the temporal structure of the input image sequence while balancing global and
local visual content. The distinctive advantage of requiring only text data for
training enables our method to learn from external text story data, enhancing
the generalization capability of visual storytelling. We conduct extensive
experiments on the VIST benchmark, showcasing the effectiveness of our approach
in both in-domain and cross-domain settings. Further evaluations on expression
diversity and human assessment underscore the superiority of our method in
terms of informativeness and robustness.Comment: ACM MM 202
Partial entropy in finite-temperature phase transitions
It is shown that the von Neumann entropy, a measure of quantum entanglement,
does have its classical counterpart in thermodynamic systems, which we call
partial entropy. Close to the critical temperature the partial entropy shows
perfect finite-size scaling behavior even for quite small system sizes. This
provides a powerful tool to quantify finite-temperature phase transitions as
demonstrated on the classical Ising model on a square lattice and the
ferromagnetic Heisenberg model on a cubic lattice.Comment: 4 pages, 6 figures, Revised versio
IMD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation
Recent progresses on self-supervised 3D human action representation learning
are largely attributed to contrastive learning. However, in conventional
contrastive frameworks, the rich complementarity between different skeleton
modalities remains under-explored. Moreover, optimized with distinguishing
self-augmented samples, models struggle with numerous similar positive
instances in the case of limited action categories. In this work, we tackle the
aforementioned problems by introducing a general Inter- and Intra-modal Mutual
Distillation (IMD) framework. In IMD, we first re-formulate the
cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process.
Different from existing distillation solutions that transfer the knowledge of a
pre-trained and fixed teacher to the student, in CMD, the knowledge is
continuously updated and bidirectionally distilled between modalities during
pre-training. To alleviate the interference of similar samples and exploit
their underlying contexts, we further design the Intra-modal Mutual
Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA)
mechanism is first introduced, where an additional cluster-level discrimination
branch is instantiated in each modality. It adaptively aggregates
highly-correlated neighboring features, forming local cluster-level
contrasting. Mutual distillation is then performed between the two branches for
cross-level knowledge exchange. Extensive experiments on three datasets show
that our approach sets a series of new records.Comment: submitted to IJCV. arXiv admin note: substantial text overlap with
arXiv:2208.1244
UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding
In the era of Large Language Models (LLMs), tremendous strides have been made
in the field of multimodal understanding. However, existing advanced algorithms
are limited to effectively utilizing the immense representation capabilities
and rich world knowledge inherent to these large pre-trained models, and the
beneficial connections among tasks within the context of text-rich scenarios
have not been sufficiently explored. In this work, we introduce UniDoc, a novel
multimodal model equipped with text detection and recognition capabilities,
which are deficient in existing approaches. Moreover, UniDoc capitalizes on the
beneficial interactions among tasks to enhance the performance of each
individual task. To implement UniDoc, we perform unified multimodal instruct
tuning on the contributed large-scale instruction following datasets.
Quantitative and qualitative experimental results show that UniDoc sets
state-of-the-art scores across multiple challenging benchmarks. To the best of
our knowledge, this is the first large multimodal model capable of simultaneous
text detection, recognition, spotting, and understanding
SinDiffusion: Learning a Diffusion Model from a Single Natural Image
We present SinDiffusion, leveraging denoising diffusion models to capture
internal distribution of patches from a single natural image. SinDiffusion
significantly improves the quality and diversity of generated samples compared
with existing GAN-based approaches. It is based on two core designs. First,
SinDiffusion is trained with a single model at a single scale instead of
multiple models with progressive growing of scales which serves as the default
setting in prior work. This avoids the accumulation of errors, which cause
characteristic artifacts in generated results. Second, we identify that a
patch-level receptive field of the diffusion network is crucial and effective
for capturing the image's patch statistics, therefore we redesign the network
structure of the diffusion model. Coupling these two designs enables us to
generate photorealistic and diverse images from a single image. Furthermore,
SinDiffusion can be applied to various applications, i.e., text-guided image
generation, and image outpainting, due to the inherent capability of diffusion
models. Extensive experiments on a wide range of images demonstrate the
superiority of our proposed method for modeling the patch distribution
Research on ride comfort and safety of vehicle under limited conditions based on dynamical tire model
During driving of a vehicle on road, the tires are undertaking load conversion of the vehicle under various driving conditions and various road conditions within contact patches. As for the contact condition between tire and road, it is often deemed as composed by spring and damping element. The contact with road is always simplified as point contact. Besides, static friction model is adopted, which has ignored physical property of friction and dynamic process of establishment of friction force. It is far from sufficient for current vehicle and road safety design. In this paper, ADAMS software is applied to establish a multi-body dynamics model of heavy vehicle, actual vehicle data was adopted to check virtual sample vehicle, and the Strikbeck dynamical friction property is introduced to tire model during rolling contact between tire and payment, interface of Simulink with ADAMS is applied to put forward a complete vehicle dynamic model truly reflecting the process of dynamic contact between tire and road, and furthermore the correctness and availability of dynamic tire model are verified through comparison with classic Pac2002 tire model. As for dynamic behaviors of heavy vehicle in special sections, finite element method (FEM) is applied to put forward a new 3D complicated road model construction method to construct roads of different classes and long-downhill paths of different S-curves. Simulated analysis of the influence of different speeds, different classes of random roads, different slopes and different adhesion road models on ride comfort of vehicle driving was implemented through utilization of event editor and drive control file, and speed limit standards under different conditions are put forward, so as to provide theoretical basis for road alignment design and reasonable driving speed. Finally, the influence and changing rules of different speeds, different classes of random roads and different slopes on driving safety are discussed from the perspectives of each radial force of tire, alignment torque, sideslip angle and roll angle
Research on ride comfort and safety of vehicle under limited conditions based on dynamical tire model
During driving of a vehicle on road, the tires are undertaking load conversion of the vehicle under various driving conditions and various road conditions within contact patches. As for the contact condition between tire and road, it is often deemed as composed by spring and damping element. The contact with road is always simplified as point contact. Besides, static friction model is adopted, which has ignored physical property of friction and dynamic process of establishment of friction force. It is far from sufficient for current vehicle and road safety design. In this paper, ADAMS software is applied to establish a multi-body dynamics model of heavy vehicle, actual vehicle data was adopted to check virtual sample vehicle, and the Strikbeck dynamical friction property is introduced to tire model during rolling contact between tire and payment, interface of Simulink with ADAMS is applied to put forward a complete vehicle dynamic model truly reflecting the process of dynamic contact between tire and road, and furthermore the correctness and availability of dynamic tire model are verified through comparison with classic Pac2002 tire model. As for dynamic behaviors of heavy vehicle in special sections, finite element method (FEM) is applied to put forward a new 3D complicated road model construction method to construct roads of different classes and long-downhill paths of different S-curves. Simulated analysis of the influence of different speeds, different classes of random roads, different slopes and different adhesion road models on ride comfort of vehicle driving was implemented through utilization of event editor and drive control file, and speed limit standards under different conditions are put forward, so as to provide theoretical basis for road alignment design and reasonable driving speed. Finally, the influence and changing rules of different speeds, different classes of random roads and different slopes on driving safety are discussed from the perspectives of each radial force of tire, alignment torque, sideslip angle and roll angle
Cross-Scale Study of the High-Steep Reservoir Banks under Different Mechanical States
AbstractThe deformation of high-steep rocky banks is caused by the self-weight of overlying rock mass and the fluctuation of reservoir water. In this paper, the newly developed testing equipment and the particle flow code (PFC) were used to complete the cross-scale study of the high-steep rocky banks under different mechanical states. The test conditions involved the dry state, saturated state, and hydraulic coupling states under different confining pressures. Combined with the micrographs of the fractured surface under different mechanical states, it can be found that the participation of the water could reduce the bond contact and accelerate the deformation of the particles, ultimately leading to an increase in the plastic deformation and a decrease in the peak strength of the rock mass. Compared to the saturated state, the water in the hydraulic coupling state was not transferred though the storage space was compressed; thus, the water pressure would further promote the extension of the microcracks. When considering the fluctuations of the reservoir water, the changes in the mechanical state may accelerate the degradation rate of the rock mass. The related methods can provide data support and a theoretical basis to the evolution trend of high-steep rocky reservoir banks
- …