8 research outputs found
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
Point-, voxel-, and range-views are three representative forms of point
clouds. All of them have accurate 3D measurements but lack color and texture
information. RGB images are a natural complement to these point cloud views and
fully utilizing the comprehensive information of them benefits more robust
perceptions. In this paper, we present a unified multi-modal LiDAR segmentation
network, termed UniSeg, which leverages the information of RGB images and three
views of the point cloud, and accomplishes semantic segmentation and panoptic
segmentation simultaneously. Specifically, we first design the Learnable
cross-Modal Association (LMA) module to automatically fuse voxel-view and
range-view features with image features, which fully utilize the rich semantic
information of images and are robust to calibration errors. Then, the enhanced
voxel-view and range-view features are transformed to the point space,where
three views of point cloud features are further fused adaptively by the
Learnable cross-View Association module (LVA). Notably, UniSeg achieves
promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes,
and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks,
including the LiDAR semantic segmentation challenge of nuScenes and panoptic
segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg
codebase, which is the largest and most comprehensive outdoor LiDAR
segmentation codebase. It contains most of the popular outdoor LiDAR
segmentation algorithms and provides reproducible implementations. The
OpenPCSeg codebase will be made publicly available at
https://github.com/PJLab-ADG/PCSeg.Comment: ICCV 2023; 21 pages; 9 figures; 18 tables; Code at
https://github.com/PJLab-ADG/PCSe
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
The pursuit of autonomous driving technology hinges on the sophisticated
integration of perception, decision-making, and control systems. Traditional
approaches, both data-driven and rule-based, have been hindered by their
inability to grasp the nuance of complex driving environments and the
intentions of other road users. This has been a significant bottleneck,
particularly in the development of common sense reasoning and nuanced scene
understanding necessary for safe and reliable autonomous driving. The advent of
Visual Language Models (VLM) represents a novel frontier in realizing fully
autonomous vehicle driving. This report provides an exhaustive evaluation of
the latest state-of-the-art VLM, GPT-4V(ision), and its application in
autonomous driving scenarios. We explore the model's abilities to understand
and reason about driving scenes, make decisions, and ultimately act in the
capacity of a driver. Our comprehensive tests span from basic scene recognition
to complex causal reasoning and real-time decision-making under varying
conditions. Our findings reveal that GPT-4V demonstrates superior performance
in scene understanding and causal reasoning compared to existing autonomous
systems. It showcases the potential to handle out-of-distribution scenarios,
recognize intentions, and make informed decisions in real driving contexts.
However, challenges remain, particularly in direction discernment, traffic
light recognition, vision grounding, and spatial reasoning tasks. These
limitations underscore the need for further research and development. Project
is now available on GitHub for interested parties to access and utilize:
\url{https://github.com/PJLab-ADG/GPT4V-AD-Exploration
Clinicopathological characteristics of gastric cancer patients with dermatomyositis and analysis of perioperative management: a case series study
BackgroundThis study aimed to investigate the clinical characteristics of gastric cancer (GC) patients with dermatomyositis (DM) and summarize the perioperative outcomes.MethodsThe clinical and pathological data of five patients diagnosed with co-occurring DM and GC (DM-GC group) were retrospectively analyzed, who were admitted to the Department of Gastrointestinal Surgery at Ren ji Hospital, Shanghai Jiao Tong University, between January 2012 and April 2023. Their data were compared with 618 GC patients (GC-1 group) from September 2016 to August 2017 and 35 GC patients who were meticulously screened from 14,580 GC cases from January 2012 and April 2023. The matching criteria included identical gender, age, tumor location, TNM stage, and surgical procedure (7 GC patients were matched for each DM-GC patient).ResultsAnalysis indicated that the DM-GC group comprised four female and one male patient. The female proportion was significantly higher (P = 0.032) than that of GC-1 group. In DM-GC group, four DM patients were diagnosed as GC within 12 months. One DM patients was diagnosed as GC within 15 months. Among them, four patients presented with varying degrees of skin rashes, muscle weakness while one patient had elevated CK levels as the typical symptom. Similarly, the preoperative tumor markers (CA-199 and CA-125) in the DM-GC group were significantly higher than normal levels (CA-199: 100 vs. 28.6%, P = 0.002; CA-125: 40 vs. 2.9%, P = 0.003) compared to GC-2 group. Moreover, postoperative complication incidence and the length of hospital stay were significantly higher in the DM-GC than GC-2 group [complication rate: 40 vs. 8.6%, P = 0.047; hospital stay: 15 days (range: 9–28) vs. 9 days (range: 8–10), P = 0.021].ConclusionGC Patients with dermatomyositis are more prone to experience postoperative complications and longer hospital stay
Speech fusion to face : bridging the gap between human's vocal characteristics and facial imaging
While deep learning technologies are now capable of generating realistic images confusing humans, the research efforts are turning to the synthesis of images for more concrete and application-specific purposes. Facial image generation based on vocal characteristics from speech is one of such important yet challenging tasks. It is the key enabler to influential use cases of image generation, especially for business in public security and entertainment. Existing solutions to the problem of speech2face renders limited image quality and fails to preserve facial similarity due to the lack of quality dataset for training and appropriate integration of vocal features. In this paper, we investigate these key technical challenges and propose Speech Fusion to Face, or SF2F in short, attempting to address the issue of facial image quality and the poor connection between vocal feature domain and modern image generation models. By adopting new strategies and approaches, we demonstrate dramatic performance boost over the state-of-the-art solution, by doubling the recall of individual identity, and lifting the quality score from 15 to 19 based on the mutual information score with VGGFace classifier.Bachelor of Engineering (Electrical and Electronic Engineering
Image Thresholding Improves 3-Dimensional Convolutional Neural Network Diagnosis of Different Acute Brain Hemorrhages on Computed Tomography Scans
Intracranial hemorrhage is a medical emergency that requires urgent diagnosis and immediate treatment to improve patient outcome. Machine learning algorithms can be used to perform medical image classification and assist clinicians in diagnosing radiological scans. In this paper, we apply 3-dimensional convolutional neural networks (3D CNN) to classify computed tomography (CT) brain scans into normal scans (N) and abnormal scans containing subarachnoid hemorrhage (SAH), intraparenchymal hemorrhage (IPH), acute subdural hemorrhage (ASDH) and brain polytrauma hemorrhage (BPH). The dataset used consists of 399 volumetric CT brain images representing approximately 12,000 images from the National Neuroscience Institute, Singapore. We used a 3D CNN to perform both 2-class (normal versus a specific abnormal class) and 4-class classification (between normal, SAH, IPH, ASDH). We apply image thresholding at the image pre-processing step, that improves 3D CNN classification accuracy and performance by accentuating the pixel intensities that contribute most to feature discrimination. For 2-class classification, the F1 scores for various pairs of medical diagnoses ranged from 0.706 to 0.902 without thresholding. With thresholding implemented, the F1 scores improved and ranged from 0.919 to 0.952. Our results are comparable to, and in some cases, exceed the results published in other work applying 3D CNN to CT or magnetic resonance imaging (MRI) brain scan classification. This work represents a direct application of a 3D CNN to a real hospital scenario involving a medically emergent CT brain diagnosis