Search CORE

292 research outputs found

Dense Voxel 3D Reconstruction Using a Monocular Event Camera

Author: Chen Haodong
Chen Xiaoming
Chung Vera
Tan Li
Publication venue
Publication date: 01/09/2023
Field of study

Event cameras are sensors inspired by biological systems that specialize in capturing changes in brightness. These emerging cameras offer many advantages over conventional frame-based cameras, including high dynamic range, high frame rates, and extremely low power consumption. Due to these advantages, event cameras have increasingly been adapted in various fields, such as frame interpolation, semantic segmentation, odometry, and SLAM. However, their application in 3D reconstruction for VR applications is underexplored. Previous methods in this field mainly focused on 3D reconstruction through depth map estimation. Methods that produce dense 3D reconstruction generally require multiple cameras, while methods that utilize a single event camera can only produce a semi-dense result. Other single-camera methods that can produce dense 3D reconstruction rely on creating a pipeline that either incorporates the aforementioned methods or other existing Structure from Motion (SfM) or Multi-view Stereo (MVS) methods. In this paper, we propose a novel approach for solving dense 3D reconstruction using only a single event camera. To the best of our knowledge, our work is the first attempt in this regard. Our preliminary results demonstrate that the proposed method can produce visually distinguishable dense 3D reconstructions directly without requiring pipelines like those used by existing methods. Additionally, we have created a synthetic dataset with

39,739

object scans using an event camera simulator. This dataset will help accelerate other relevant research in this field

arXiv.org e-Print Archive

Fine-grained Activity Classification In Assembly Based On Multi-visual Modalities

Author: Chen Haodong
Leu Ming-Chuan
Yin Zhaozheng
Zendehdel Niloofar
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2023
Field of study

Assembly activity recognition and prediction help to improve productivity, quality control, and safety measures in smart factories. This study aims to sense, recognize, and predict a worker\u27s continuous fine-grained assembly activities in a manufacturing platform. We propose a two-stage network for workers\u27 fine-grained activity classification by leveraging scene-level and temporal-level activity features. The first stage is a feature awareness block that extracts scene-level features from multi-visual modalities, including red, green blue (RGB) and hand skeleton frames. We use the transfer learning method in the first stage and compare three different pre-trained feature extraction models. Then, we transmit the feature information from the first stage to the second stage to learn the temporal-level features of activities. The second stage consists of the Recurrent Neural Network (RNN) layers and a final classifier. We compare the performance of two different RNNs in the second stage, including the Long Short-Term Memory (LSTM) and the Gated Recurrent Unit (GRU). The partial video observation method is used in the prediction of fine-grained activities. In the experiments using the trimmed activity videos, our model achieves an accuracy of \u3e 99% on our dataset and \u3e 98% on the public dataset UCF 101, outperforming the state-of-the-art models. The prediction model achieves an accuracy of \u3e 97% in predicting activity labels using 50% of the onset activity video information. In the experiments using an untrimmed video with continuous assembly activities, we combine our recognition and prediction models and achieve an accuracy of \u3e 91% in real time, surpassing the state-of-the-art models for the recognition of continuous assembly activities

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

Author: Chen Kai
Duan Haodong
Lin Dahua
Xiong Yuanjun
Zhao Yue
Publication venue
Publication date: 19/09/2022
Field of study

Deep learning models have achieved excellent recognition results on large-scale video benchmarks. However, they perform poorly when applied to videos with rare scenes or objects, primarily due to the bias of existing video datasets. We tackle this problem from two different angles: algorithm and dataset. From the perspective of algorithms, we propose Spatial-aware Multi-Aspect Debiasing (SMAD), which incorporates both explicit debiasing with multi-aspect adversarial training and implicit debiasing with the spatial actionness reweighting module, to learn a more generic representation invariant to non-action aspects. To neutralize the intrinsic dataset bias, we propose OmniDebias to leverage web data for joint training selectively, which can achieve higher performance with far fewer web data. To verify the effectiveness, we establish evaluation protocols and perform extensive experiments on both re-distributed splits of existing datasets and a new evaluation dataset focusing on the action with rare scenes. We also show that the debiased representation can generalize better when transferred to other datasets and tasks.Comment: ECCVW 202

arXiv.org e-Print Archive

Experimental study on thermal runaway risk of 18650 lithium ion battery under side-heating condition

Author: Chen Haodong
Li Huang
Wang Qingsong
Wang Yu
Zhong Guobin
Publication venue: 'Elsevier BV'
Publication date: 01/09/2019
Field of study

Edinburgh Research Explorer

Sparse Array Enabled Near-Field Communications: Beam Pattern Analysis and Hybrid Beamforming Design

Author: Chen Li
Shi Shuo
You Changsheng
Zhang Haodong
Zhou Cong
Publication venue
Publication date: 11/01/2024
Field of study

Extremely large-scale array (XL-array) has emerged as a promising technology to enable near-field communications for achieving enhanced spectrum efficiency and spatial resolution, by drastically increasing the number of antennas. However, this also inevitably incurs higher hardware and energy cost, which may not be affordable in future wireless systems. To address this issue, we propose in this paper to exploit two types of sparse arrays (SAs) for enabling near-field communications. Specifically, we first consider the linear sparse array (LSA) and characterize its near-field beam pattern. It is shown that despite the achieved beam-focusing gain, the LSA introduces several undesired grating-lobes, which have comparable beam power with the main-lobe and are focused on specific regions. An efficient hybrid beamforming design is then proposed for the LSA to deal with the potential strong inter-user interference (IUI). Next, we consider another form of SA, called extended coprime array (ECA), which is composed of two LSA subarrays with different (coprime) inter-antenna spacing. By characterizing the ECA near-field beam pattern, we show that compared with the LSA with the same array sparsity, the ECA can greatly suppress the beam power of near-field grating-lobes thanks to the offset effect of the two subarrays, albeit with a larger number of grating-lobes. This thus motivates us to propose a customized two-phase hybrid beamforming design for the ECA. Finally, numerical results are presented to demonstrate the rate performance gain of the proposed two SAs over the conventional uniform linear array (ULA).Comment: In this paper, we propose to exploit sparse arrays for enabling near-field communications and characterize its unique beam pattern for facilitating its hybrid beamforming desig

arXiv.org e-Print Archive

The Efficiency of Dodecafluoro-2-Methylpentan-3-One on Suppressing the Lithium Ion Battery Fire

Author: Chen Haodong
Duan Qiangling
Li Ke
Sun Jinhua
Wang Qingsong
Wang Yu
Publication venue: 'ASME International'
Publication date: 11/04/2018
Field of study

Crossref

Edinburgh Research Explorer

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Author: Chen Yingcong
Li Haodong
Liang Yixun
Lin Jiantao
Xu Xiaogang
Yang Xin
Publication venue
Publication date: 01/12/2023
Field of study

The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency.Comment: The first two authors contributed equally to this work. Our code will be available at: https://github.com/EnVision-Research/LucidDreame

arXiv.org e-Print Archive

When Urban Region Profiling Meets Large Language Models

Author: Chen Haodong
Chen Wei
Liang Yuxuan
Wen Haomin
Wen Qingsong
Yan Yibo
Zhong Siru
Zimmermann Roger
Publication venue
Publication date: 21/10/2023
Field of study

Urban region profiling from web-sourced data is of utmost importance for urban planning and sustainable development. We are witnessing a rising trend of LLMs for various fields, especially dealing with multi-modal data research such as vision-language learning, where the text modality serves as a supplement information for the image. Since textual modality has never been introduced into modality combinations in urban region profiling, we aim to answer two fundamental questions in this paper: i) Can textual modality enhance urban region profiling? ii) and if so, in what ways and with regard to which aspects? To answer the questions, we leverage the power of Large Language Models (LLMs) and introduce the first-ever LLM-enhanced framework that integrates the knowledge of textual modality into urban imagery profiling, named LLM-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining (UrbanCLIP). Specifically, it first generates a detailed textual description for each satellite image by an open-source Image-to-Text LLM. Then, the model is trained on the image-text pairs, seamlessly unifying natural language supervision for urban visual representation learning, jointly with contrastive loss and language modeling loss. Results on predicting three urban indicators in four major Chinese metropolises demonstrate its superior performance, with an average improvement of 6.1% on R^2 compared to the state-of-the-art methods. Our code and the image-language dataset will be released upon paper notification

arXiv.org e-Print Archive