182 research outputs found

    Multimodal Data Augmentation for Image Captioning using Diffusion Models

    Full text link
    Image captioning, an important vision-language task, often requires a tremendous number of finely labeled image-caption pairs for learning the underlying alignment between images and texts. In this paper, we proposed a multimodal data augmentation method, leveraging a recent text-to-image model called Stable Diffusion, to expand the training set via high-quality generation of image-caption pairs. Extensive experiments on the MS COCO dataset demonstrate the advantages of our approach over several benchmark methods, and particularly a significant boost when having fewer training instances. In addition, models trained on our augmented datasets also outperform prior unpaired image captioning methods by a large margin. Finally, further improvement regarding the training efficiency and effectiveness can be obtained after intentionally filtering the generated data based on quality assessment

    The finite element modeling and stability prediction of high-speed spindle system dynamics with spindle-holder-tool joints

    Get PDF
    The stability of high-speed spindle system affects the surface finish and tool life directly, which is an important factor to evaluate its performance. Meanwhile, the spindle dynamics and cutting stability are affected by the structure and dynamics of spindle-holder-tool joints significantly. The joints are simplified as the distribution-spring, and the FEM modeling process of spindle system is proposed based on the thought of parallel rotor system. Taking a vertical machining center as example, the effectiveness of the modeling method is verified. Starting from the stability evaluation criteria and different ways of getting FRF, the influence factors of unconditional and conditional stability regions are analyzed. Based on the proposed model, the influence laws of cutting stability on cutting force amplitude and speed are characterized by the three-dimensional lobes, limit cutting depths and lobe intersections, which provide the theoretical basis for optimizing the processing and improving the cutting stability

    Interaction induced decay of a heteronuclear two-atom system

    Get PDF
    Two-atom systems in small traps are of fundamental interest, first of all for understanding the role of interactions in degenerate cold gases and for the creation of quantum gates in quantum information processing with single-atom traps. One of the key quantities is the inelastic relaxation (decay) time when one of the atoms or both are in a higher hyperfine state. Here we measure this quantity in a heteronuclear system of 87^{87}Rb and 85^{85}Rb in a micro optical trap and demonstrate experimentally and theoretically the presence of both fast and slow relaxation processes, depending on the choice of the initial hyperfine states. The developed experimental method allows us to single out a particular relaxation process and, in this sense, our experiment is a "superclean platform" for collisional physics studies. Our results have also implications for engineering of quantum states via controlled collisions and creation of two-qubit quantum gates.Comment: 8 pages, 3 figure

    IoT and Wearable Devices-Enhanced Information Provision of AR Glasses: A Multi-Modal Analysis in Aviation Industry

    Get PDF
    While Augmented Reality (AR) glasses are now instrumental in industries for delivering work-related information, the current one-size-fits-all information provision of AR glasses fails to cater to diverse workers’ needs and environmental conditions. We propose a framework for harnessing Internet of thing (IoT) and wearable technology to improve the adaptability and customization of information provision by AR. As a preliminary exploration, this short paper develops a multi-modal data processing system for work performance classification in the aviation industry. Using machine learning algorithms for multi-modal feature extraction and classifier construction, this framework provides a more objective and consistent evaluation of work performance compared to single-modal approaches. The proposed analytics architecture can provide valuable insights for other industries struggling to implement IoT and mixed reality

    Effectiveness of Post-Traumatic Growth Intervention to Promote Positive Post-Traumatic Traits in Chinese Breast Cancer Patients:A Randomized Controlled Trial

    Get PDF
    Objective: The purpose of this study was to evaluate the effectiveness of post-traumatic growth (PTG) model-based intervention to improve positive psychological traits in Chinese breast cancer patients. Design: A randomized control trial of a psychological group intervention based on PTG model. Methods: The Clinical Trial was registered on 17 August 2019 at Chinese Clinical Trials.gov with Registration number ChiCTR1900025264. A total of 92 patients with breast cancer were recruited. The participants were randomly assigned to the experimental group (n = 46) and the control group (n = 46). A six-session psychological group intervention based on PTG model was implemented in the experimental group, and a six-session health education was implemented in the control group. The outcomes were measured at baseline (pre-intervention), 3 weeks, 6 weeks after the intervention. The primary outcome was post-traumatic growth assessed by the Chinese version of the Post-Traumatic Growth Inventory (PTGI); Secondary outcomes included psychological resilience, family resilience, rumination, and self-disclosure. Results: A total of 87 patients with breast cancer completed this study, including 44 patients in the experimental group and 43 patients in the control group. There was no significant difference in baseline data of breast cancer patients between the two groups except for the treatment regimen (p &gt; 0.05). The two groups were compared after the intervention; the interaction effects between the total scores of post-traumatic growth, family resilience, and self-disclosure and the time term were statistically significant (p &lt; 0.05), indicating that the trend of change in total scores of post-traumatic growth, family resilience, and self-disclosure differed between the experimental and control groups over time, and the scores improved in the experimental group were significantly higher than those in the control group. The comparison of psychological resilience and total score of rumination at each time point was statistically significant (p &lt; 0.05), indicating that group intervention based on the PTG model could improve the psychological recovery ability and rumination level of the experimental group. Conclusion: The psychological group intervention based on the PTG model significantly improved post-traumatic growth, family resilience, and self-disclosure in patients with breast cancer. However, the impact on psychological resilience and rumination was relatively small. Long-term intervention is needed to further test the effect of the PTG model on psychological resilience and rumination.</p

    A Close Look at Spatial Modeling: From Attention to Convolution

    Full text link
    Vision Transformers have shown great promise recently for many vision tasks due to the insightful architecture design and attention mechanism. By revisiting the self-attention responses in Transformers, we empirically observe two interesting issues. First, Vision Transformers present a queryirrelevant behavior at deep layers, where the attention maps exhibit nearly consistent contexts in global scope, regardless of the query patch position (also head-irrelevant). Second, the attention maps are intrinsically sparse, few tokens dominate the attention weights; introducing the knowledge from ConvNets would largely smooth the attention and enhance the performance. Motivated by above observations, we generalize self-attention formulation to abstract a queryirrelevant global context directly and further integrate the global context into convolutions. The resulting model, a Fully Convolutional Vision Transformer (i.e., FCViT), purely consists of convolutional layers and firmly inherits the merits of both attention mechanism and convolutions, including dynamic property, weight sharing, and short- and long-range feature modeling, etc. Experimental results demonstrate the effectiveness of FCViT. With less than 14M parameters, our FCViT-S12 outperforms related work ResT-Lite by 3.7% top1 accuracy on ImageNet-1K. When scaling FCViT to larger models, we still perform better than previous state-of-the-art ConvNeXt with even fewer parameters. FCViT-based models also demonstrate promising transferability to downstream tasks, like object detection, instance segmentation, and semantic segmentation. Codes and models are made available at: https://github.com/ma-xu/FCViT
    • …
    corecore