8 research outputs found

    UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

    Full text link
    Point-, voxel-, and range-views are three representative forms of point clouds. All of them have accurate 3D measurements but lack color and texture information. RGB images are a natural complement to these point cloud views and fully utilizing the comprehensive information of them benefits more robust perceptions. In this paper, we present a unified multi-modal LiDAR segmentation network, termed UniSeg, which leverages the information of RGB images and three views of the point cloud, and accomplishes semantic segmentation and panoptic segmentation simultaneously. Specifically, we first design the Learnable cross-Modal Association (LMA) module to automatically fuse voxel-view and range-view features with image features, which fully utilize the rich semantic information of images and are robust to calibration errors. Then, the enhanced voxel-view and range-view features are transformed to the point space,where three views of point cloud features are further fused adaptively by the Learnable cross-View Association module (LVA). Notably, UniSeg achieves promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes, and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks, including the LiDAR semantic segmentation challenge of nuScenes and panoptic segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg codebase, which is the largest and most comprehensive outdoor LiDAR segmentation codebase. It contains most of the popular outdoor LiDAR segmentation algorithms and provides reproducible implementations. The OpenPCSeg codebase will be made publicly available at https://github.com/PJLab-ADG/PCSeg.Comment: ICCV 2023; 21 pages; 9 figures; 18 tables; Code at https://github.com/PJLab-ADG/PCSe

    On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving

    Full text link
    The pursuit of autonomous driving technology hinges on the sophisticated integration of perception, decision-making, and control systems. Traditional approaches, both data-driven and rule-based, have been hindered by their inability to grasp the nuance of complex driving environments and the intentions of other road users. This has been a significant bottleneck, particularly in the development of common sense reasoning and nuanced scene understanding necessary for safe and reliable autonomous driving. The advent of Visual Language Models (VLM) represents a novel frontier in realizing fully autonomous vehicle driving. This report provides an exhaustive evaluation of the latest state-of-the-art VLM, GPT-4V(ision), and its application in autonomous driving scenarios. We explore the model's abilities to understand and reason about driving scenes, make decisions, and ultimately act in the capacity of a driver. Our comprehensive tests span from basic scene recognition to complex causal reasoning and real-time decision-making under varying conditions. Our findings reveal that GPT-4V demonstrates superior performance in scene understanding and causal reasoning compared to existing autonomous systems. It showcases the potential to handle out-of-distribution scenarios, recognize intentions, and make informed decisions in real driving contexts. However, challenges remain, particularly in direction discernment, traffic light recognition, vision grounding, and spatial reasoning tasks. These limitations underscore the need for further research and development. Project is now available on GitHub for interested parties to access and utilize: \url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration

    Clinicopathological characteristics of gastric cancer patients with dermatomyositis and analysis of perioperative management: a case series study

    Get PDF
    BackgroundThis study aimed to investigate the clinical characteristics of gastric cancer (GC) patients with dermatomyositis (DM) and summarize the perioperative outcomes.MethodsThe clinical and pathological data of five patients diagnosed with co-occurring DM and GC (DM-GC group) were retrospectively analyzed, who were admitted to the Department of Gastrointestinal Surgery at Ren ji Hospital, Shanghai Jiao Tong University, between January 2012 and April 2023. Their data were compared with 618 GC patients (GC-1 group) from September 2016 to August 2017 and 35 GC patients who were meticulously screened from 14,580 GC cases from January 2012 and April 2023. The matching criteria included identical gender, age, tumor location, TNM stage, and surgical procedure (7 GC patients were matched for each DM-GC patient).ResultsAnalysis indicated that the DM-GC group comprised four female and one male patient. The female proportion was significantly higher (P = 0.032) than that of GC-1 group. In DM-GC group, four DM patients were diagnosed as GC within 12 months. One DM patients was diagnosed as GC within 15 months. Among them, four patients presented with varying degrees of skin rashes, muscle weakness while one patient had elevated CK levels as the typical symptom. Similarly, the preoperative tumor markers (CA-199 and CA-125) in the DM-GC group were significantly higher than normal levels (CA-199: 100 vs. 28.6%, P = 0.002; CA-125: 40 vs. 2.9%, P = 0.003) compared to GC-2 group. Moreover, postoperative complication incidence and the length of hospital stay were significantly higher in the DM-GC than GC-2 group [complication rate: 40 vs. 8.6%, P = 0.047; hospital stay: 15 days (range: 9–28) vs. 9 days (range: 8–10), P = 0.021].ConclusionGC Patients with dermatomyositis are more prone to experience postoperative complications and longer hospital stay

    Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging

    No full text
    While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies and approaches, we demonstrate dramatic performance boost over the state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier.Bachelor of Engineering (Electrical and Electronic Engineering

    Image Thresholding Improves 3-Dimensional Convolutional Neural Network Diagnosis of Different Acute Brain Hemorrhages on Computed Tomography Scans

    No full text
    Intracranial hemorrhage is a medical emergency that requires urgent diagnosis and immediate treatment to improve patient outcome. Machine learning algorithms can be used to perform medical image classification and assist clinicians in diagnosing radiological scans. In this paper, we apply 3-dimensional convolutional neural networks (3D CNN) to classify computed tomography (CT) brain scans into normal scans (N) and abnormal scans containing subarachnoid hemorrhage (SAH), intraparenchymal hemorrhage (IPH), acute subdural hemorrhage (ASDH) and brain polytrauma hemorrhage (BPH). The dataset used consists of 399 volumetric CT brain images representing approximately 12,000 images from the National Neuroscience Institute, Singapore. We used a 3D CNN to perform both 2-class (normal versus a specific abnormal class) and 4-class classification (between normal, SAH, IPH, ASDH). We apply image thresholding at the image pre-processing step, that improves 3D CNN classification accuracy and performance by accentuating the pixel intensities that contribute most to feature discrimination. For 2-class classification, the F1 scores for various pairs of medical diagnoses ranged from 0.706 to 0.902 without thresholding. With thresholding implemented, the F1 scores improved and ranged from 0.919 to 0.952. Our results are comparable to, and in some cases, exceed the results published in other work applying 3D CNN to CT or magnetic resonance imaging (MRI) brain scan classification. This work represents a direct application of a 3D CNN to a real hospital scenario involving a medically emergent CT brain diagnosis
    corecore