260 research outputs found

    Motion and Context-Aware Audio-Visual Conditioned Video Prediction

    Full text link
    Existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame. However, a direct inference of per-pixel intensity for the next visual frame from the latent codes is extremely challenging because of the high-dimensional image space. To this end, we propose to decouple the audio-visual conditioned video prediction into motion and appearance modeling. The first part is the multimodal motion estimation module that learns motion information as optical flow from the given audio-visual clip. The second part is the context-aware refinement module that uses the predicted optical flow to warp the current visual frame into the next visual frame and refines it base on the given audio-visual context. Experimental results show that our method achieves competitive results on existing benchmarks.Comment: under consideration at Computer Vision and Image Understandin

    Carrier Dynamics in Submonolayer InGaAs/GaAs Quantum Dots

    Get PDF
    Carrier dynamics of submonolayer (SML) InGaAs/GaAs quantum dots (QDs) were studied by micro-photoluminecence (MPL), selectively excited photoluminescence (SEPL), and time-resolved photoluminescence (TRPL). MPL and SEPL show the coexistence of localized and delocalized states, and different local phonon modes. TRPL reveal shorter recombination lifetimes and longer capture times for the QDs with higher emission energy. This suggests that the smallest SML QDs are formed by perfectly vertically correlated 2D InAs islands, having the highest In content and the lowest emission energy, while a slight deviation from the perfectly vertical correlation produces larger QDs with lower In content and higher emission energy.Comment: 12 pages, 5 figure

    Towards Robust Few-shot Point Cloud Semantic Segmentation

    Full text link
    Few-shot point cloud semantic segmentation aims to train a model to quickly adapt to new unseen classes with only a handful of support set samples. However, the noise-free assumption in the support set can be easily violated in many practical real-world settings. In this paper, we focus on improving the robustness of few-shot point cloud segmentation under the detrimental influence of noisy support sets during testing time. To this end, we first propose a Component-level Clean Noise Separation (CCNS) representation learning to learn discriminative feature representations that separates the clean samples of the target classes from the noisy samples. Leveraging the well separated clean and noisy support samples from our CCNS, we further propose a Multi-scale Degree-based Noise Suppression (MDNS) scheme to remove the noisy shots from the support set. We conduct extensive experiments on various noise settings on two benchmark datasets. Our results show that the combination of CCNS and MDNS significantly improves the performance. Our code is available at https://github.com/Pixie8888/R3DFSSeg.Comment: BMVC 202

    Rethink Cross-Modal Fusion in Weakly-Supervised Audio-Visual Video Parsing

    Full text link
    Existing works on weakly-supervised audio-visual video parsing adopt hybrid attention network (HAN) as the multi-modal embedding to capture the cross-modal context. It embeds the audio and visual modalities with a shared network, where the cross-attention is performed at the input. However, such an early fusion method highly entangles the two non-fully correlated modalities and leads to sub-optimal performance in detecting single-modality events. To deal with this problem, we propose the messenger-guided mid-fusion transformer to reduce the uncorrelated cross-modal context in the fusion. The messengers condense the full cross-modal context into a compact representation to only preserve useful cross-modal information. Furthermore, due to the fact that microphones capture audio events from all directions, while cameras only record visual events within a restricted field of view, there is a more frequent occurrence of unaligned cross-modal context from audio for visual event predictions. We thus propose cross-audio prediction consistency to suppress the impact of irrelevant audio information on visual event prediction. Experiments consistently illustrate the superior performance of our framework compared to existing state-of-the-art methods.Comment: WACV 202

    A Natural Wind Defrosting, Nano-coated Antibacterial Self-cleaning Energy-saving Health Air-cooled Refrigerator

    Get PDF
    The air-cooled frost-free household refrigerator is popular in the market because of its large size and frost-free size. However, the evaporator defrost process consumes a large amount of electrical energy to limit the wide spread of this refrigerator, at the same time because of its structural problems, resulting in its evaporator, air duct can not be artificially cleaned, leading to the growth of bacteria, pollution of food storage. This research has developed a self-cleaning energy-saving health refrigerator that uses indoor natural wind defrosting, ultra-hydrophilic nano-titanium dioxide coating photocatalytic sterilization and sterilization. After experimental comparison, under the same operating time of the same operating conditions, the refrigeration mode saves 1.5%, the defrost process saves 95%, reduces the amount of frosting by 23%, the temperature changes of the freezer is less than 7 ℃ , and the desterilization rate of nano-coated reaches 80%

    Lattice piecewise affine approximation of explicit nonlinear model predictive control with application to trajectory tracking of mobile robot

    Full text link
    To promote the widespread use of mobile robots in diverse fields, the performance of trajectory tracking must be ensured. To address the constraints and nonlinear features associated with mobile robot systems, we apply nonlinear model predictive control (MPC) to realize the trajectory tracking of mobile robots. Specifically, to alleviate the online computational complexity of nonlinear MPC, this paper devises a lattice piecewise affine (PWA) approximation method that can approximate both the nonlinear system and control law of explicit nonlinear MPC. The kinematic model of the mobile robot is successively linearized along the trajectory to obtain a linear time-varying description of the system, which is then expressed using a lattice PWA model. Subsequently, the nonlinear MPC problem can be transformed into a series of linear MPC problems. Furthermore, to reduce the complexity of online calculation of multiple linear MPC problems, we approximate the optimal solution of the linear MPC by using the lattice PWA model. That is, for different sampling states, the optimal control inputs are obtained, and lattice PWA approximations are constructed for the state control pairs. Simulations are performed to evaluate the performance of our method in comparison with the linear MPC and explicit linear MPC frameworks. The results show that compared with the explicit linear MPC, our method has a higher online computing speed and can decrease the offline computing time without significantly increasing the tracking error

    Generalized Few-Shot Point Cloud Segmentation Via Geometric Words

    Full text link
    Existing fully-supervised point cloud segmentation methods suffer in the dynamic testing environment with emerging new classes. Few-shot point cloud segmentation algorithms address this problem by learning to adapt to new classes at the sacrifice of segmentation accuracy for the base classes, which severely impedes its practicality. This largely motivates us to present the first attempt at a more practical paradigm of generalized few-shot point cloud segmentation, which requires the model to generalize to new categories with only a few support point clouds and simultaneously retain the capability to segment base classes. We propose the geometric words to represent geometric components shared between the base and novel classes, and incorporate them into a novel geometric-aware semantic representation to facilitate better generalization to the new classes without forgetting the old ones. Moreover, we introduce geometric prototypes to guide the segmentation with geometric prior knowledge. Extensive experiments on S3DIS and ScanNet consistently illustrate the superior performance of our method over baseline methods. Our code is available at: https://github.com/Pixie8888/GFS-3DSeg_GWs.Comment: Accepted by ICCV 202
    • …
    corecore