814 research outputs found

    Existing Weld Seam Recognition and Tracking Based on Sub Region Image Processing

    Get PDF
    This paper proposes a new algorithm of weld seam recognition for existing weld seam tracking based on sub region neural network. The original images need to be reduced by half and transformed to gray image. Then each picture is divided into 96 small pictures. Sub region neural network of three layers is applied to each small picture. The identification of 96 sub pictures is synthetized to complete the weld seam recognition result of each image. Before training, 5000 samples are obtained in total and they are classified into two categories. 4000 sets of them are considered as training data and 1000 left are selected as testing data. Accuracy rate can reach 92% by adjusting the node number of hidden layer. Experimental results show that various types of weld seam have excellent performance. As a result, the new algorithm is very effective and has some advantages. Network structure is very simple. Moreover, less training time is requested. It is very significant that weld seam feature numbers remain unchanged although sub images are input of neural network

    MO-VLN: A Multi-Task Benchmark for Open-set Zero-Shot Vision-and-Language Navigation

    Full text link
    Given a natural language, a general robot has to comprehend the instruction and find the target object or location based on visual observations even in unexplored environments. Most agents rely on massive diverse training data to achieve better generalization, which requires expensive labor. These agents often focus on common objects and fewer tasks, thus are not intelligent enough to handle different types of instructions. To facilitate research in open-set vision-and-language navigation, we propose a benchmark named MO-VLN, aiming at testing the effectiveness and generalization of the agent in the multi-task setting. First, we develop a 3D simulator rendered by realistic scenarios using Unreal Engine 5, containing more realistic lights and details. The simulator contains three scenes, i.e., cafe, restaurant, and nursing house, of high value in the industry. Besides, our simulator involves multiple uncommon objects, such as takeaway cup and medical adhesive tape, which are more complicated compared with existing environments. Inspired by the recent success of large language models (e.g., ChatGPT, Vicuna), we construct diverse high-quality data of instruction type without human annotation. Our benchmark MO-VLN provides four tasks: 1) goal-conditioned navigation given a specific object category (e.g., "fork"); 2) goal-conditioned navigation given simple instructions (e.g., "Search for and move towards a tennis ball"); 3) step-by-step instruction following; 4) finding abstract object based on high-level instruction (e.g., "I am thirsty").Comment: 18 page

    SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models

    Full text link
    Diffusion models, which have emerged to become popular text-to-image generation models, can produce high-quality and content-rich images guided by textual prompts. However, there are limitations to semantic understanding and commonsense reasoning in existing models when the input prompts are concise narrative, resulting in low-quality image generation. To improve the capacities for narrative prompts, we propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models. To reach this goal, we first collect and annotate a new dataset SURD which consists of more than 57,000 semantically corrected multi-modal samples. Each sample contains a simple narrative prompt, a complex keyword-based prompt, and a high-quality image. Then, we align the semantic representation of narrative prompts to the complex prompts and transfer knowledge of large language models (LLMs) to our SUR-adapter via knowledge distillation so that it can acquire the powerful semantic understanding and reasoning capabilities to build a high-quality textual semantic representation for text-to-image generation. We conduct experiments by integrating multiple LLMs and popular pre-trained diffusion models to show the effectiveness of our approach in enabling diffusion models to understand and reason concise natural language without image quality degradation. Our approach can make text-to-image diffusion models easier to use with better user experience, which demonstrates our approach has the potential for further advancing the development of user-friendly text-to-image generation models by bridging the semantic gap between simple narrative prompts and complex keyword-based prompts. The code is released at https://github.com/Qrange-group/SUR-adapter.Comment: accepted by ACM MM 202

    ASR: Attention-alike Structural Re-parameterization

    Full text link
    The structural re-parameterization (SRP) technique is a novel deep learning technique that achieves interconversion between different network architectures through equivalent parameter transformations. This technique enables the mitigation of the extra costs for performance improvement during training, such as parameter size and inference time, through these transformations during inference, and therefore SRP has great potential for industrial and practical applications. The existing SRP methods have successfully considered many commonly used architectures, such as normalizations, pooling methods, multi-branch convolution. However, the widely used self-attention modules cannot be directly implemented by SRP due to these modules usually act on the backbone network in a multiplicative manner and the modules' output is input-dependent during inference, which limits the application scenarios of SRP. In this paper, we conduct extensive experiments from a statistical perspective and discover an interesting phenomenon Stripe Observation, which reveals that channel attention values quickly approach some constant vectors during training. This observation inspires us to propose a simple-yet-effective attention-alike structural re-parameterization (ASR) that allows us to achieve SRP for a given network while enjoying the effectiveness of the self-attention mechanism. Extensive experiments conducted on several standard benchmarks demonstrate the effectiveness of ASR in generally improving the performance of existing backbone networks, self-attention modules, and SRP methods without any elaborated model crafting. We also analyze the limitations and provide experimental or theoretical evidence for the strong robustness of the proposed ASR.Comment: Technical repor

    Ambient conditions disordered-ordered phase transition of two-dimensional interfacial water molecules dependent on charge dipole moment

    Get PDF
    Phase transitions of water molecules are commonly expected to occur only under extreme conditions, such as nanoconfinement, high pressure, or low temperature. We herein report the disordered-ordered phase transition of two-dimensional interfacial water molecules under ambient conditions using molecular-dynamics simulations. This phase transition is greatly dependent on the charge dipole moment, production of both charge values, and the dipole length of the solid surface. The phase transition can be identified by a sharp change in water-water interaction energies and the order parameters of the two-dimensional interfacial water monolayer, under a tiny dipole moment change near the critical dipole moment. The critical dipole moment of the solid material surface can classify a series of materials that can induce distinct ordered phases of surface water, which may also result in surface wetting, friction, and other properties
    corecore