Search CORE

689 research outputs found

A Generalized Multi-Modal Fusion Detection Framework

Author: Cui Leichao
Li Xiuxian
Meng Min
Mo Xiaoyu
Publication venue
Publication date: 29/06/2023
Field of study

LiDAR point clouds have become the most common data source in autonomous driving. However, due to the sparsity of point clouds, accurate and reliable detection cannot be achieved in specific scenarios. Because of their complementarity with point clouds, images are getting increasing attention. Although with some success, existing fusion methods either perform hard fusion or do not fuse in a direct manner. In this paper, we propose a generic 3D detection framework called MMFusion, using multi-modal features. The framework aims to achieve accurate fusion between LiDAR and images to improve 3D detection in complex scenes. Our framework consists of two separate streams: the LiDAR stream and the camera stream, which can be compatible with any single-modal feature extraction network. The Voxel Local Perception Module in the LiDAR stream enhances local feature representation, and then the Multi-modal Feature Fusion Module selectively combines feature output from different streams to achieve better fusion. Extensive experiments have shown that our framework not only outperforms existing benchmarks but also improves their detection, especially for detecting cyclists and pedestrians on KITTI benchmarks, with strong robustness and generalization capabilities. Hopefully, our work will stimulate more research into multi-modal fusion for autonomous driving tasks

arXiv.org e-Print Archive

Augmenting Reinforcement Learning with Transformer-based Scene Representation Learning for Decision-making of Autonomous Driving

Author: Huang Zhiyu
Liu Haochen
Lv Chen
Mo Xiaoyu
Publication venue
Publication date: 25/08/2023
Field of study

Decision-making for urban autonomous driving is challenging due to the stochastic nature of interactive traffic participants and the complexity of road structures. Although reinforcement learning (RL)-based decision-making scheme is promising to handle urban driving scenarios, it suffers from low sample efficiency and poor adaptability. In this paper, we propose Scene-Rep Transformer to improve the RL decision-making capabilities with better scene representation encoding and sequential predictive latent distillation. Specifically, a multi-stage Transformer (MST) encoder is constructed to model not only the interaction awareness between the ego vehicle and its neighbors but also intention awareness between the agents and their candidate routes. A sequential latent Transformer (SLT) with self-supervised learning objectives is employed to distill the future predictive information into the latent scene representation, in order to reduce the exploration space and speed up training. The final decision-making module based on soft actor-critic (SAC) takes as input the refined latent scene representation from the Scene-Rep Transformer and outputs driving actions. The framework is validated in five challenging simulated urban scenarios with dense traffic, and its performance is manifested quantitatively by the substantial improvements in data efficiency and performance in terms of success rate, safety, and efficiency. The qualitative results reveal that our framework is able to extract the intentions of neighbor agents to help make decisions and deliver more diversified driving behaviors

arXiv.org e-Print Archive

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

Author: He Xiangnan
Hong Richang
Liu Daqing
Mo Xiaoyu
Zhang Hanwang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Grounding natural language in images, such as localizing "the black dog on the left of the tree", is one of the core problems in artificial intelligence, as it needs to comprehend the fine-grained and compositional language space. However, existing solutions rely on the association between the holistic language features and visual features, while neglect the nature of compositional reasoning implied in the language. In this paper, we propose a natural language grounding model that can automatically compose a binary tree structure for parsing the language and then perform visual reasoning along the tree in a bottom-up fashion. We call our model RVG-TREE: Recursive Grounding Tree, which is inspired by the intuition that any language expression can be recursively decomposed into two constituent parts, and the grounding confidence score can be recursively accumulated by calculating their grounding scores returned by sub-trees. RVG-TREE can be trained end-to-end by using the Straight-Through Gumbel-Softmax estimator that allows the gradients from the continuous score functions passing through the discrete tree construction. Experiments on several benchmarks show that our model achieves the state-of-the-art performance with more explainable reasoning.Comment: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Identifying novel potential drug targets for endometriosis via plasma proteome screening

Author: Liangbin Zhao
Tian Tao
Xiaoyu Mo
Publication venue: Frontiers Media S.A.
Publication date: 01/07/2024
Field of study

BackgroundEndometriosis (EM) is a chronic painful condition that predominantly affects women of reproductive age. Currently, surgery or medication can only provide limited symptom relief. This study used a comprehensive genetic analytical approach to explore potential drug targets for EM in the plasma proteome.MethodsIn this study, 2,923 plasma proteins were selected as exposure and EM as outcome for two-sample Mendelian randomization (MR) analyses. The plasma proteomic data were derived from the UK Biobank Pharmaceutical Proteomics Project (UKB-PPP), while the EM dataset from the FinnGen consortium R10 release data. Several sensitivity analyses were performed, including summary-data-based MR (SMR) analyses, heterogeneity in dependent instruments (HEIDI) test, reverse MR analyses, steiger detection test, and bayesian co-localization analyses. Furthermore, proteome-wide association study (PWAS) and single-cell transcriptomic analyses were also conducted to validate the findings.ResultsSix significant (p < 3.06 × 10-5) plasma protein-EM pairs were identified by MR analyses. These included EPHB4 (OR = 1.40, 95% CI: 1.20 - 1.63), FSHB (OR = 3.91, 95% CI: 3.13 - 4.87), RSPO3 (OR = 1.60, 95% CI: 1.38 - 1.86), SEZ6L2 (OR = 1.44, 95% CI: 1.23 - 1.68) and WASHC3 (OR = 2.00, 95% CI: 1.54 - 2.59) were identified as risk factors, whereas KDR (OR = 0.80, 95% CI: 0.75 - 0.90) was found to be a protective factor. All six plasma proteins passed the SMR test (P < 8.33 × 10-3), but only four plasma proteins passed the HEIDI heterogeneity test (PHEIDI > 0.05), namely FSHB, RSPO3, SEZ6L2 and EPHB4. These four proteins showed strong evidence of co-localization (PPH4 > 0.7). In particular, RSPO3 and EPHB4 were replicated in the validated PWAS. Single-cell analyses revealed high expression of SEZ6L2 and EPHB4 in stromal and epithelial cells within EM lesions, while RSPO3 exhibited elevated expression in stromal cells and fibroblasts.ConclusionOur study identified FSHB, RSPO3, SEZ6L2, and EPHB4 as potential drug targets for EM and highlighted the critical role of stromal and epithelial cells in disease development. These findings provide new insights into the diagnosis and treatment of EM

Directory of Open Access Journals

Recent progress in carbon dots for anti-pathogen applications in oral cavity

Author: Chuqiang Yin
Guotai Li
Jianning Mo
Jianning Mo
Qihui Zhou
Ting Wang
Xiaoyu Wang
Xiaoyu Wang
Yuying Jiang
Yuying Jiang
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

BackgroundOral microbial infections are one of the most common diseases. Their progress not only results in the irreversible destruction of teeth and other oral tissues but also closely links to oral cancers and systemic diseases. However, traditional treatment against oral infections by antibiotics is not effective enough due to microbial resistance and drug blocking by oral biofilms, along with the passive dilution of the drug on the infection site in the oral environment.Aim of reviewBesides the traditional antibiotic treatment, carbon dots (CDs) recently became an emerging antimicrobial and microbial imaging agent because of their excellent (bio)physicochemical performance. Their application in treating oral infections has received widespread attention, as witnessed by increasing publication in this field. However, to date, there is no comprehensive review available yet to analyze their effectiveness and mechanism. Herein, as a step toward addressing the present gap, this review aims to discuss the recent advances in CDs against diverse oral pathogens and thus propose novel strategies in the treatment of oral microbial infections.Key scientific concepts of reviewIn this manuscript, the recent progress of CDs against oral pathogens is summarized for the first time. We highlighted the antimicrobial abilities of CDs in terms of oral planktonic bacteria, intracellular bacteria, oral pathogenic biofilms, and fungi. Next, we introduced their microbial imaging and detection capabilities and proposed the prospects of CDs in early diagnosis of oral infection and pathogen microbiological examination. Lastly, we discussed the perspectives on clinical transformation and the current limitations of CDs in the treatment of oral microbial infections

Directory of Open Access Journals

Map-adaptive multimodal trajectory prediction using hierarchical graph neural networks

Author: Liu Haochen
Lv Chen
Mo Xiaoyu
Xing Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Predicting the multimodal future motions of neighboring agents is essential for an autonomous vehicle to navigate complex scenarios. It is challenging as the motion of an agent is affected by the complex interaction among itself, other agents, and the local roads. Unlike most existing works, which predict a fixed number of possible future motions of an agent, we propose a map-adaptive predictor that can predict a variable number of future trajectories of an agent according to the number of lanes with candidate centerlines (CCLs). The predictor predicts not only future motions guided by single CCLs but also a scene-reasoning prediction and a motion-maintaining prediction. These three kinds of predictions are produced integrally via a single graph operation. We represent the driving scene with a heterogeneous hierarchical graph containing nodes of two types. An agent node contains its dynamics feature encoded from its historical states, and a CCL node contains the CCL's sequential feature. We propose a hierarchical graph operator (HGO) with an edge-masking technology to regulate the information flow in graph operations and obtain the encoded scene feature for the trajectory decoder. Experiments on two large-scale real-world driving datasets show that our method realizes map-adaptive prediction and outperforms strong baselines

CERES Research Repository (Cranfield Univ.)

DR-NTU (Digital Repository of NTU)

Multi-agent trajectory prediction with heterogeneous edge-enhanced graph attention network

Author: Huang Zhiyu
Lv Chen
Mo Xiaoyu
Xing Yang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Simultaneous trajectory prediction for multiple heterogeneous traffic participants is essential for safe and efficient operation of connected automated vehicles under complex driving situations. Two main challenges for this task are to handle the varying number of heterogeneous target agents and jointly consider multiple factors that would affect their future motions. This is because different kinds of agents have different motion patterns, and their behaviors are jointly affected by their individual dynamics, their interactions with surrounding agents, as well as the traffic infrastructures. A trajectory prediction method handling these challenges will benefit the downstream decision-making and planning modules of autonomous vehicles. To meet these challenges, we propose a three-channel framework together with a novel Heterogeneous Edge-enhanced graph ATtention network (HEAT). Our framework is able to deal with the heterogeneity of the target agents and traffic participants involved. Specifically, agents' dynamics are extracted from their historical states using type-specific encoders. The inter-agent interactions are represented with a directed edge-featured heterogeneous graph and processed by the designed HEAT network to extract interaction features. Besides, the map features are shared across all agents by introducing a selective gate-mechanism. And finally, the trajectories of multiple agents are predicted simultaneously. Validations using both urban and highway driving datasets show that the proposed model can realize simultaneous trajectory predictions for multiple agents under complex traffic situations, and achieve state-of-the-art performance with respect to prediction accuracy. The achieved final displacement error (FDE@3sec) is 0.66 meter under urban driving, demonstrating the feasibility and effectiveness of the proposed approach

CERES Research Repository (Cranfield Univ.)

DR-NTU (Digital Repository of NTU)