8 research outputs found
A Language Agent for Autonomous Driving
Human-level driving is an ultimate goal of autonomous driving. Conventional
approaches formulate autonomous driving as a perception-prediction-planning
framework, yet their systems do not capitalize on the inherent reasoning
ability and experiential knowledge of humans. In this paper, we propose a
fundamental paradigm shift from current pipelines, exploiting Large Language
Models (LLMs) as a cognitive agent to integrate human-like intelligence into
autonomous driving systems. Our approach, termed Agent-Driver, transforms the
traditional autonomous driving pipeline by introducing a versatile tool library
accessible via function calls, a cognitive memory of common sense and
experiential knowledge for decision-making, and a reasoning engine capable of
chain-of-thought reasoning, task planning, motion planning, and
self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive
common sense and robust reasoning capabilities, thus enabling a more nuanced,
human-like approach to autonomous driving. We evaluate our approach on the
large-scale nuScenes benchmark, and extensive experiments substantiate that our
Agent-Driver significantly outperforms the state-of-the-art driving methods by
a large margin. Our approach also demonstrates superior interpretability and
few-shot learning ability to these methods. Code will be released.Comment: Project Page: https://usc-gvl.github.io/Agent-Driver
CLIP: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data
Contrastive Language-Image Pre-training, benefiting from large-scale
unlabeled text-image pairs, has demonstrated great performance in open-world
vision understanding tasks. However, due to the limited Text-3D data pairs,
adapting the success of 2D Vision-Language Models (VLM) to the 3D space remains
an open problem. Existing works that leverage VLM for 3D understanding
generally resort to constructing intermediate 2D representations for the 3D
data, but at the cost of losing 3D geometry information. To take a step toward
open-world 3D vision understanding, we propose Contrastive Language-Image-Point
Cloud Pretraining (CLIP) to directly learn the transferable 3D point cloud
representation in realistic scenarios with a novel proxy alignment mechanism.
Specifically, we exploit naturally-existed correspondences in 2D and 3D
scenarios, and build well-aligned and instance-based text-image-point proxies
from those complex scenarios. On top of that, we propose a cross-modal
contrastive objective to learn semantic and instance-level aligned point cloud
representation. Experimental results on both indoor and outdoor scenarios show
that our learned 3D representation has great transfer ability in downstream
tasks, including zero-shot and few-shot 3D recognition, which boosts the
state-of-the-art methods by large margins. Furthermore, we provide analyses of
the capability of different representations in real scenarios and present the
optional ensemble scheme.Comment: To appear at CVPR 202
Periāoperative Takotsubo syndrome after nonācardiac surgery: a retrospective nested caseācontrol study
Abstract Aims Takotsubo syndrome (TTS) is an acute reversible cardiac dysfunction that may occur during the periāoperative period and among patients with serious illness. We aimed to evaluate the clinical characteristics, periāoperative management, and prognosis of periāoperative TTS (pTTS) and explore the factors associated with pTTS. Methods We conducted a retrospective nested caseācontrol study using the database of patients who underwent ināhospital nonācardiac surgeries between January 2017 and December 2020 in Peking University Third hospital. Cases were adult patients diagnosed TTS at discharge who were matched with four controls based on operative types. Multivariable conditional logistic regression was used to identified the factors associated with pTTS. The area under the curve (AUC) was used to evaluate the diagnostic efficacy. Results Among the 128ā536 patients underwent nonācardiac surgery, 20 patients with pTTS and 80 patients without were enrolled in this study. The incidence of pTTS was about 0.016% in our centre. The median age of patients with pTTS was 52.5 (38.25, 76.25) years, although 90% of them were female. Fifty per cent (9 cases) of female patients were preāmenopausal. Caesarean section has the highest proportion of pTTS (30% of the pTTS cases) with the incidence of caesarean sectionārelated pTTS of 0.06% in our centre. A high prevalence of nonāapical ballooning pattern of regional wall motion abnormality (seven cases, 35%) and a high mortality (two cases, 10%) were observed. Left ventricular ejection fraction (LVEF) of patients with pTTS was significantly decreased (41.7Ā Ā±Ā 8.8%). In the acute phase, supportive treatments aiming to reduce lifeāthreatening complications were main treatment strategies. After receiving systematic treatment, significant improvements were observed in LVEF (63.1Ā Ā±Ā 13.5%), with median recovery time of LVEF of 7.48Ā days. Leucocyte count [odds ratio (OR): 4.59; 95% confidence interval (CI): 1.10ā19.15], haemoglobin (HGB) (OR: 10.52; 95% CI: 1.04ā106.36), and the revised cardiac risk index (RCRI) score (OR: 6.30; 95% CI: 1.05ā37.88) were the factors significantly associated with pTTS. The RCRI score performed poorly in the prediction of pTTS (AUC: 0.630; 95% CI: 0.525ā0.735). After adding leucocyte count and HGB into the RCRI score, the AUC was significantly improved (AUC: 0.768; 95% CI: 0.671ā0.865; PĀ =Ā 0.001). Conclusions Patients with pTTS have some differences compared with common TTS, including higher proportion of preāmenopausal female, higher prevalence during caesarean section, higher prevalence of nonāapical ballooning pattern of regional wall motion abnormality, and higher mortality. The RCRI score performed poorly in the evaluation of pTTS. Adding HGB and leucocyte count into the RCRI score could significantly improve its predictive performance