76 research outputs found

    Text-Only Training for Visual Storytelling

    Full text link
    Visual storytelling aims to generate a narrative based on a sequence of images, necessitating both vision-language alignment and coherent story generation. Most existing solutions predominantly depend on paired image-text training data, which can be costly to collect and challenging to scale. To address this, we formulate visual storytelling as a visual-conditioned story generation problem and propose a text-only training method that separates the learning of cross-modality alignment and story generation. Our approach specifically leverages the cross-modality pre-trained CLIP model to integrate visual control into a story generator, trained exclusively on text data. Moreover, we devise a training-free visual condition planner that accounts for the temporal structure of the input image sequence while balancing global and local visual content. The distinctive advantage of requiring only text data for training enables our method to learn from external text story data, enhancing the generalization capability of visual storytelling. We conduct extensive experiments on the VIST benchmark, showcasing the effectiveness of our approach in both in-domain and cross-domain settings. Further evaluations on expression diversity and human assessment underscore the superiority of our method in terms of informativeness and robustness.Comment: ACM MM 202

    Partial entropy in finite-temperature phase transitions

    Full text link
    It is shown that the von Neumann entropy, a measure of quantum entanglement, does have its classical counterpart in thermodynamic systems, which we call partial entropy. Close to the critical temperature the partial entropy shows perfect finite-size scaling behavior even for quite small system sizes. This provides a powerful tool to quantify finite-temperature phase transitions as demonstrated on the classical Ising model on a square lattice and the ferromagnetic Heisenberg model on a cubic lattice.Comment: 4 pages, 6 figures, Revised versio

    I2^2MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

    Full text link
    Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I2^2MD) framework. In I2^2MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.Comment: submitted to IJCV. arXiv admin note: substantial text overlap with arXiv:2208.1244

    UniDoc: A Universal Large Multimodal Model for Simultaneous Text Detection, Recognition, Spotting and Understanding

    Full text link
    In the era of Large Language Models (LLMs), tremendous strides have been made in the field of multimodal understanding. However, existing advanced algorithms are limited to effectively utilizing the immense representation capabilities and rich world knowledge inherent to these large pre-trained models, and the beneficial connections among tasks within the context of text-rich scenarios have not been sufficiently explored. In this work, we introduce UniDoc, a novel multimodal model equipped with text detection and recognition capabilities, which are deficient in existing approaches. Moreover, UniDoc capitalizes on the beneficial interactions among tasks to enhance the performance of each individual task. To implement UniDoc, we perform unified multimodal instruct tuning on the contributed large-scale instruction following datasets. Quantitative and qualitative experimental results show that UniDoc sets state-of-the-art scores across multiple challenging benchmarks. To the best of our knowledge, this is the first large multimodal model capable of simultaneous text detection, recognition, spotting, and understanding

    SinDiffusion: Learning a Diffusion Model from a Single Natural Image

    Full text link
    We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. SinDiffusion significantly improves the quality and diversity of generated samples compared with existing GAN-based approaches. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales which serves as the default setting in prior work. This avoids the accumulation of errors, which cause characteristic artifacts in generated results. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics, therefore we redesign the network structure of the diffusion model. Coupling these two designs enables us to generate photorealistic and diverse images from a single image. Furthermore, SinDiffusion can be applied to various applications, i.e., text-guided image generation, and image outpainting, due to the inherent capability of diffusion models. Extensive experiments on a wide range of images demonstrate the superiority of our proposed method for modeling the patch distribution

    Research on ride comfort and safety of vehicle under limited conditions based on dynamical tire model

    Get PDF
    During driving of a vehicle on road, the tires are undertaking load conversion of the vehicle under various driving conditions and various road conditions within contact patches. As for the contact condition between tire and road, it is often deemed as composed by spring and damping element. The contact with road is always simplified as point contact. Besides, static friction model is adopted, which has ignored physical property of friction and dynamic process of establishment of friction force. It is far from sufficient for current vehicle and road safety design. In this paper, ADAMS software is applied to establish a multi-body dynamics model of heavy vehicle, actual vehicle data was adopted to check virtual sample vehicle, and the Strikbeck dynamical friction property is introduced to tire model during rolling contact between tire and payment, interface of Simulink with ADAMS is applied to put forward a complete vehicle dynamic model truly reflecting the process of dynamic contact between tire and road, and furthermore the correctness and availability of dynamic tire model are verified through comparison with classic Pac2002 tire model. As for dynamic behaviors of heavy vehicle in special sections, finite element method (FEM) is applied to put forward a new 3D complicated road model construction method to construct roads of different classes and long-downhill paths of different S-curves. Simulated analysis of the influence of different speeds, different classes of random roads, different slopes and different adhesion road models on ride comfort of vehicle driving was implemented through utilization of event editor and drive control file, and speed limit standards under different conditions are put forward, so as to provide theoretical basis for road alignment design and reasonable driving speed. Finally, the influence and changing rules of different speeds, different classes of random roads and different slopes on driving safety are discussed from the perspectives of each radial force of tire, alignment torque, sideslip angle and roll angle

    Research on ride comfort and safety of vehicle under limited conditions based on dynamical tire model

    Get PDF
    During driving of a vehicle on road, the tires are undertaking load conversion of the vehicle under various driving conditions and various road conditions within contact patches. As for the contact condition between tire and road, it is often deemed as composed by spring and damping element. The contact with road is always simplified as point contact. Besides, static friction model is adopted, which has ignored physical property of friction and dynamic process of establishment of friction force. It is far from sufficient for current vehicle and road safety design. In this paper, ADAMS software is applied to establish a multi-body dynamics model of heavy vehicle, actual vehicle data was adopted to check virtual sample vehicle, and the Strikbeck dynamical friction property is introduced to tire model during rolling contact between tire and payment, interface of Simulink with ADAMS is applied to put forward a complete vehicle dynamic model truly reflecting the process of dynamic contact between tire and road, and furthermore the correctness and availability of dynamic tire model are verified through comparison with classic Pac2002 tire model. As for dynamic behaviors of heavy vehicle in special sections, finite element method (FEM) is applied to put forward a new 3D complicated road model construction method to construct roads of different classes and long-downhill paths of different S-curves. Simulated analysis of the influence of different speeds, different classes of random roads, different slopes and different adhesion road models on ride comfort of vehicle driving was implemented through utilization of event editor and drive control file, and speed limit standards under different conditions are put forward, so as to provide theoretical basis for road alignment design and reasonable driving speed. Finally, the influence and changing rules of different speeds, different classes of random roads and different slopes on driving safety are discussed from the perspectives of each radial force of tire, alignment torque, sideslip angle and roll angle

    Cross-Scale Study of the High-Steep Reservoir Banks under Different Mechanical States

    Get PDF
    AbstractThe deformation of high-steep rocky banks is caused by the self-weight of overlying rock mass and the fluctuation of reservoir water. In this paper, the newly developed testing equipment and the particle flow code (PFC) were used to complete the cross-scale study of the high-steep rocky banks under different mechanical states. The test conditions involved the dry state, saturated state, and hydraulic coupling states under different confining pressures. Combined with the micrographs of the fractured surface under different mechanical states, it can be found that the participation of the water could reduce the bond contact and accelerate the deformation of the particles, ultimately leading to an increase in the plastic deformation and a decrease in the peak strength of the rock mass. Compared to the saturated state, the water in the hydraulic coupling state was not transferred though the storage space was compressed; thus, the water pressure would further promote the extension of the microcracks. When considering the fluctuations of the reservoir water, the changes in the mechanical state may accelerate the degradation rate of the rock mass. The related methods can provide data support and a theoretical basis to the evolution trend of high-steep rocky reservoir banks
    corecore