Search CORE

15 research outputs found

Hierarchical Masked 3D Diffusion Model for Video Outpainting

Author: Fan Fanda
Ge Tiezheng
Gong Litong
Guo Chaoxu
Jiang Yuning
Luo Chunjie
Wang Biao
Zhan Jianfeng
Publication venue
Publication date: 05/09/2023
Field of study

Video outpainting aims to adequately complete missing areas at the edges of video frames. Compared to image outpainting, it presents an additional challenge as the model should maintain the temporal consistency of the filled area. In this paper, we introduce a masked 3D diffusion model for video outpainting. We use the technique of mask modeling to train the 3D diffusion model. This allows us to use multiple guide frames to connect the results of multiple video clip inferences, thus ensuring temporal consistency and reducing jitter between adjacent frames. Meanwhile, we extract the global frames of the video as prompts and guide the model to obtain information other than the current video clip using cross-attention. We also introduce a hybrid coarse-to-fine inference pipeline to alleviate the artifact accumulation problem. The existing coarse-to-fine pipeline only uses the infilling strategy, which brings degradation because the time interval of the sparse frames is too large. Our pipeline benefits from bidirectional learning of the mask modeling and thus can employ a hybrid strategy of infilling and interpolation when generating sparse frames. Experiments show that our method achieves state-of-the-art results in video outpainting tasks. More results are provided at our https://fanfanda.github.io/M3DDM/.Comment: ACM MM 2023 accepte

arXiv.org e-Print Archive

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

Author: Deng Jiajun
Guo Chaoxu
Jiang Li
Li Hongsheng
Shi Jianping
Shi Shaoshuai
Wang Xiaogang
Wang Zhe
Publication venue
Publication date: 07/11/2022
Field of study

3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about

3\times

faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of 150m * 150m.Comment: Accepted by International Journal of Computer Vision (IJCV), code is available at https://github.com/open-mmlab/OpenPCDe

arXiv.org e-Print Archive

Self-Paced AutoEncoder

Author: Chaoxu Guo
Chunhong Pan
Lingfeng Wang
Shiming Xiang
Tingzhao Yu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Spatial prediction of the geological hazard vulnerability of mountain road network using machine learning algorithms

Author: Chaoxu Guo
Hongqiang Dou
Siyi Huang
Wenbin Jian
Yongxin Sun
Publication venue: 'Informa UK Limited'
Publication date: 01/12/2023
Field of study

AbstractThe current assessment index of the geological hazard vulnerability assessment for mountain road network is relatively simple, and the assessment methods used are subjective, complex, and inefficient. This study proposes a prediction model for geological hazard vulnerability assessment of mountain road network incorporating machine learning algorithms. First, based on the quantification of the characteristics of the mountain road network and the local rescue forces, an objective and reasonable index-based system of vulnerability assessment of the mountain road network was constructed by combining the population, economic, and material factors. Second, the FAHP and AHP-TOPSIS were applied for the development of the vulnerability assessment models to carry out the preliminary vulnerability assessment for different road types. Third, the results of the preliminary vulnerability assessment were used as the sample set to build a road vulnerability prediction model using SVM, RF, and BPNN algorithms. Finally, the five-fold cross-validation and statistical parameter accuracy analysis were conducted to determine the most reasonable model with the highest prediction accuracy for geological hazard vulnerability mapping of the mountain road network. The results indicated that the vulnerability prediction model based on the FAHP sample set using the RF algorithm demonstrated the highest accuracy and robustness

Directory of Open Access Journals

Video Object Detection with Locally-Weighted Deformable Neighbors

Author: Gao Peng
Guo Chaoxu
Jiang Zhengkai
Pan Chunhong
Xiang Shiming
Zhang Qian
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 17/07/2019
Field of study

Deep convolutional neural networks have achieved great success on various image recognition tasks. However, it is nontrivial to transfer the existing networks to video due to the fact that most of them are developed for static image. Frame-byframe processing is suboptimal because temporal information that is vital for video understanding is totally abandoned. Furthermore, frame-by-frame processing is slow and inefficient, which can hinder the practical usage. In this paper, we propose LWDN (Locally-Weighted Deformable Neighbors) for video object detection without utilizing time-consuming optical flow extraction networks. LWDN can latently align the high-level features between keyframes and keyframes or nonkeyframes. Inspired by (Zhu et al. 2017a) and (Hetang et al. 2017) who propose to aggregate features between keyframes and keyframes, we adopt brain-inspired memory mechanism to propagate and update the memory feature from keyframes to keyframes. We call this process Memory-Guided Propagation. With such a memory mechanism, the discriminative ability of features in keyframes and non-keyframes are both enhanced, which helps to improve the detection accuracy. Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU

Association for the Advancement of Artificial Intelligence: AAAI Publications

Research on the Application of Phone Location Data in the Rapid Delimitation of the Meizoseismal Area

Author: Arapostathis
Bossu
Bossu
Burks
Cai
Cresci
D'Auria
Fan
Fan
Fan Xiwei
Guo
Kropivnitskaya
Kropivnitskaya
Li
Li Huayue
Mendoza
Nie
Nie
Nie Gaozhong
Pang Xiaoke
Sakaki
Wald
Wang
Wang
Wang
Wei
Xia Chaoxu
Yang
Zhang
Zhang
Zhao
Zhao
Zhong
Zhou Junxue
Publication venue: 'Seismological Society of America (SSA)'
Publication date
Field of study

Crossref