31 research outputs found

    Audio-Visual Segmentation

    Full text link
    We propose to explore a new problem called audio-visual segmentation (AVS), in which the goal is to output a pixel-level map of the object(s) that produce sound at the time of the image frame. To facilitate this research, we construct the first audio-visual segmentation benchmark (AVSBench), providing pixel-wise annotations for the sounding objects in audible videos. Two settings are studied with this benchmark: 1) semi-supervised audio-visual segmentation with a single sound source and 2) fully-supervised audio-visual segmentation with multiple sound sources. To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process. We also design a regularization loss to encourage the audio-visual mapping during training. Quantitative and qualitative experiments on the AVSBench compare our approach to several existing methods from related tasks, demonstrating that the proposed method is promising for building a bridge between the audio and pixel-wise visual semantics. Code is available at https://github.com/OpenNLPLab/AVSBench.Comment: ECCV 2022; Correct the equation (3) and update the notation of the evaluation metrics in the last arxiv version; Code is available at https://github.com/OpenNLPLab/AVSBenc

    Fine-grained Audible Video Description

    Full text link
    We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD). It aims to provide detailed textual descriptions for the given audible videos, including the appearance and spatial locations of each object, the actions of moving objects, and the sounds in videos. Existing visual-language modeling tasks often concentrate on visual cues in videos while undervaluing the language and audio modalities. On the other hand, FAVD requires not only audio-visual-language modeling skills but also paragraph-level language generation abilities. We construct the first fine-grained audible video description benchmark (FAVDBench) to facilitate this research. For each video clip, we first provide a one-sentence summary of the video, ie, the caption, followed by 4-6 sentences describing the visual details and 1-2 audio-related descriptions at the end. The descriptions are provided in both English and Chinese. We create two new metrics for this task: an EntityScore to gauge the completeness of entities in the visual descriptions, and an AudioScore to assess the audio descriptions. As a preliminary approach to this task, we propose an audio-visual-language transformer that extends existing video captioning model with an additional audio branch. We combine the masked language modeling and auto-regressive language modeling losses to optimize our model so that it can produce paragraph-level descriptions. We illustrate the efficiency of our model in audio-visual-language modeling by evaluating it against the proposed benchmark using both conventional captioning metrics and our proposed metrics. We further put our benchmark to the test in video generation models, demonstrating that employing fine-grained video descriptions can create more intricate videos than using captions.Comment: accpeted to CVPR 2023, Xuyang Shen, Dong Li and Jinxing Zhou contribute equally, code link: github.com/OpenNLPLab/FAVDBench, dataset link: www.avlbench.opennlplab.c

    AIM Studies on Reactions FNCX → FXCN (X = O, S, and Se)

    No full text

    Simulation of Microscopic Fracture Behavior in Nanocomposite Ceramic Tool Materials

    No full text
    In this paper, the microstructures of nanocomposite ceramic tool materials are represented through Voronoi tessellation. A cohesive element model is established to perform the crack propagation simulation by introducing cohesive elements with fracture criteria into microstructure models. Both intergranular and transgranular cracking are considered in this work. The influences of nanoparticle size, microstructure type, nanoparticle volume content and interface fracture energy are analyzed, respectively. The simulation results show that the nanoparticles have changed the fracture pattern from intergranular mode in single-phase materials to intergranular–transgranular–mixed mode. It is mainly the nanoparticles along grain boundaries that have an impact on the fracture pattern change in nanocomposite ceramic tool materials. Microstructures with smaller nanoparticles, in which there are more nanoparticles dispersed along matrix grain boundaries, have higher fracture toughness. Microstructures with a nanoparticle volume content of 15% have the most obvious transgranular fracture phenomenon and the highest critical fracture energy release rate. A strong interface is useful for enhancing the fracture toughness of nanocomposite ceramic tool materials

    A Fast Identification Method of Gunshot Types Based on Knowledge Distillation

    No full text
    To reduce the large size of a gunshot recognition network model and to improve the insufficient real-time detection in urban combat, this paper proposes a fast gunshot type recognition method based on knowledge distillation. First, the muzzle blast and the shock wave generated by the gunshot are preprocessed, and the quality of the gunshot recognition dataset is improved using Log-Mel spectrum corresponding to these two signals. Second, a teacher network is constructed using 10 two-dimensional residual modules, and a student network is designed using depth wise separable convolution. Third, the lightweight student network is made to learn the gunshot features under the guidance of the pre-trained large-scale teacher network. Finally, the network’s accuracy, model size, and recognition time are tested using the AudioSet dataset and the NIJ Grant 2016-DN-BX-0183 gunshot dataset. The findings demonstrate that the proposed algorithm achieved 95.6% and 83.5% accuracy on the two datasets, the speed was 0.5 s faster, and the model size was reduced to 2.5 MB. The proposed method is of good practical value in the field of gunshot recognition

    Collaborative Evolution Mechanism and Simulation of Construction Waste Recycling Stakeholders Based on Social Network

    No full text
    With the continuous advancement of urbanization, a huge amount of construction waste is generated in large-scale construction activities, which has aggravated the problems of environmental pollution, waste of resources and destruction of city appearance. Construction waste recycling can effectively solve these problems. However, the recycling rate of construction waste is low in China. Therefore, this paper, firstly through the way of literature analysis and questionnaire investigation, analyzes the factors that influence construction waste resource utilization, determines the key influence factors and the stakeholders in the process of construction waste resource utilization, and uses social network analysis method to identify core stakeholders. On this basis, this paper selects construction enterprises and recycling enterprises as the game subjects, and the government and the public as the external environment to explore the influence of the external environment on the cooperation behavior of the two stakeholders, and uses Matlab simulation to analyze the influence of external variables on the decision-making behavior evolution of the two stakeholders. The research results show that the government, construction enterprises, recycling enterprises and the public are the four core stakeholders of the construction waste recycling system, which have the power to control the information transmission among other stakeholders and play a great supporting role in the smooth implementation of the construction waste recycling project. Among them, the construction enterprise and recycling enterprise are the construction waste recycling system’s two stakeholders playing the pivotal role, and the government and the public are the external environment of the construction waste recycling system’s incentive and regulatory effect. The difference between the benefits and costs of the two stakeholders and the effect intensity of the external environment determines the stable state of the system, that is, the stronger the effect of the external environment and the larger the difference, the more the behavior of the two tends toward the recycling, on-site recycling strategy. Government penalties and rewards can effectively reduce the illegal dumping of construction waste, while excessive penalties and rewards have limitations in controlling illegal dumping. Public participation can effectively improve the efficiency of government supervision. The research results help to deeply understand the behavior, needs and cooperation of stakeholders in the construction waste recycling market, improve the efficiency of cooperation between construction enterprises and recycling enterprises, and provide management inspiration for the construction waste recycling practice

    Insight into the Effects of Electrostatic Potentials on the Conversion Mechanism of the Hydrogen-Bonded Complexes and Carbon-Bonded Complexes: An Ab Initio and Quantum Theory of “Atoms in Molecules” Investigation

    No full text
    Carbon bond and hydrogen bond are common noncovalent interactions; although recent advances on these interactions have been achieved in both the experimental and computational aspects, little is known about the conversion mechanism between them. Here, MP2 calculations with aug-cc-pVDZ basis set (aug-cc-pVDZ-pp for element Sn) were used to optimize the geometric configurations of the hydrogen-bonded complexes MH3F···HCN (M = C, Si, Ge, and Sn), carbon-bonded complexes HCN···MH3F (M = C, Si, Ge, and Sn), and transition states; the conversion mechanism between these two types of interactions has been carried out. The molecular electrostatic potential, especially the σ-hole, is directly related to the flatten degree of intrinsic reaction coordinate (IRC) curve. The energy barriers from the hydrogen-bonded complexes to the carbon-bonded complexes are 6.99, 7.73, 10.56, and 13.59 kJ·mol–1. The energy barriers from the carbon-bonded complexes to the hydrogen-bonded complexes are 4.65, 7.81, 9.10, and 13.04 kJ·mol–1. The breakage and formation of the bonds along the reaction paths have been discussed by the topological analysis of electronic density. The energy barriers are obviously related to the width of the structure transition region (STR). For the first derivative curve of IRC energy surface versus reaction coordinate, there is a maximum peak and a minimum peak, reflecting the structural transition states in the ring STRs

    A Fast Identification Method of Gunshot Types Based on Knowledge Distillation

    No full text
    To reduce the large size of a gunshot recognition network model and to improve the insufficient real-time detection in urban combat, this paper proposes a fast gunshot type recognition method based on knowledge distillation. First, the muzzle blast and the shock wave generated by the gunshot are preprocessed, and the quality of the gunshot recognition dataset is improved using Log-Mel spectrum corresponding to these two signals. Second, a teacher network is constructed using 10 two-dimensional residual modules, and a student network is designed using depth wise separable convolution. Third, the lightweight student network is made to learn the gunshot features under the guidance of the pre-trained large-scale teacher network. Finally, the network’s accuracy, model size, and recognition time are tested using the AudioSet dataset and the NIJ Grant 2016-DN-BX-0183 gunshot dataset. The findings demonstrate that the proposed algorithm achieved 95.6% and 83.5% accuracy on the two datasets, the speed was 0.5 s faster, and the model size was reduced to 2.5 MB. The proposed method is of good practical value in the field of gunshot recognition
    corecore