15 research outputs found

    Hierarchical Masked 3D Diffusion Model for Video Outpainting

    Full text link
    Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks. More results are provided at our https://fanfanda.github.io/M3DDM/.Comment: ACM MM 2023 accepte

    PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

    Full text link
    3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about 3×3\times faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of 150m * 150m.Comment: Accepted by International Journal of Computer Vision (IJCV), code is available at https://github.com/open-mmlab/OpenPCDe

    Self-Paced AutoEncoder

    No full text

    Spatial prediction of the geological hazard vulnerability of mountain road network using machine learning algorithms

    No full text
    AbstractThe current assessment index of the geological hazard vulnerability assessment for mountain road network is relatively simple, and the assessment methods used are subjective, complex, and inefficient. This study proposes a prediction model for geological hazard vulnerability assessment of mountain road network incorporating machine learning algorithms. First, based on the quantification of the characteristics of the mountain road network and the local rescue forces, an objective and reasonable index-based system of vulnerability assessment of the mountain road network was constructed by combining the population, economic, and material factors. Second, the FAHP and AHP-TOPSIS were applied for the development of the vulnerability assessment models to carry out the preliminary vulnerability assessment for different road types. Third, the results of the preliminary vulnerability assessment were used as the sample set to build a road vulnerability prediction model using SVM, RF, and BPNN algorithms. Finally, the five-fold cross-validation and statistical parameter accuracy analysis were conducted to determine the most reasonable model with the highest prediction accuracy for geological hazard vulnerability mapping of the mountain road network. The results indicated that the vulnerability prediction model based on the FAHP sample set using the RF algorithm demonstrated the highest accuracy and robustness

    Video Object Detection with Locally-Weighted Deformable Neighbors

    No full text
    Deep convolutional neural networks have achieved great success on various image recognition tasks. However, it is nontrivial to transfer the existing networks to video due to the fact that most of them are developed for static image. Frame-byframe processing is suboptimal because temporal information that is vital for video understanding is totally abandoned. Furthermore, frame-by-frame processing is slow and inefficient, which can hinder the practical usage. In this paper, we propose LWDN (Locally-Weighted Deformable Neighbors) for video object detection without utilizing time-consuming optical flow extraction networks. LWDN can latently align the high-level features between keyframes and keyframes or nonkeyframes. Inspired by (Zhu et al. 2017a) and (Hetang et al. 2017) who propose to aggregate features between keyframes and keyframes, we adopt brain-inspired memory mechanism to propagate and update the memory feature from keyframes to keyframes. We call this process Memory-Guided Propagation. With such a memory mechanism, the discriminative ability of features in keyframes and non-keyframes are both enhanced, which helps to improve the detection accuracy. Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU
    corecore