8 research outputs found

    A Language Agent for Autonomous Driving

    Full text link
    Human-level driving is an ultimate goal of autonomous driving. Conventional approaches formulate autonomous driving as a perception-prediction-planning framework, yet their systems do not capitalize on the inherent reasoning ability and experiential knowledge of humans. In this paper, we propose a fundamental paradigm shift from current pipelines, exploiting Large Language Models (LLMs) as a cognitive agent to integrate human-like intelligence into autonomous driving systems. Our approach, termed Agent-Driver, transforms the traditional autonomous driving pipeline by introducing a versatile tool library accessible via function calls, a cognitive memory of common sense and experiential knowledge for decision-making, and a reasoning engine capable of chain-of-thought reasoning, task planning, motion planning, and self-reflection. Powered by LLMs, our Agent-Driver is endowed with intuitive common sense and robust reasoning capabilities, thus enabling a more nuanced, human-like approach to autonomous driving. We evaluate our approach on the large-scale nuScenes benchmark, and extensive experiments substantiate that our Agent-Driver significantly outperforms the state-of-the-art driving methods by a large margin. Our approach also demonstrates superior interpretability and few-shot learning ability to these methods. Code will be released.Comment: Project Page: https://usc-gvl.github.io/Agent-Driver

    CLIP2^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

    Full text link
    Contrastive Language-Image Pre-training, benefiting from large-scale unlabeled text-image pairs, has demonstrated great performance in open-world vision understanding tasks. However, due to the limited Text-3D data pairs, adapting the success of 2D Vision-Language Models (VLM) to the 3D space remains an open problem. Existing works that leverage VLM for 3D understanding generally resort to constructing intermediate 2D representations for the 3D data, but at the cost of losing 3D geometry information. To take a step toward open-world 3D vision understanding, we propose Contrastive Language-Image-Point Cloud Pretraining (CLIP2^2) to directly learn the transferable 3D point cloud representation in realistic scenarios with a novel proxy alignment mechanism. Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios. On top of that, we propose a cross-modal contrastive objective to learn semantic and instance-level aligned point cloud representation. Experimental results on both indoor and outdoor scenarios show that our learned 3D representation has great transfer ability in downstream tasks, including zero-shot and few-shot 3D recognition, which boosts the state-of-the-art methods by large margins. Furthermore, we provide analyses of the capability of different representations in real scenarios and present the optional ensemble scheme.Comment: To appear at CVPR 202

    Periā€operative Takotsubo syndrome after nonā€cardiac surgery: a retrospective nested caseā€“control study

    No full text
    Abstract Aims Takotsubo syndrome (TTS) is an acute reversible cardiac dysfunction that may occur during the periā€operative period and among patients with serious illness. We aimed to evaluate the clinical characteristics, periā€operative management, and prognosis of periā€operative TTS (pTTS) and explore the factors associated with pTTS. Methods We conducted a retrospective nested caseā€“control study using the database of patients who underwent inā€hospital nonā€cardiac surgeries between January 2017 and December 2020 in Peking University Third hospital. Cases were adult patients diagnosed TTS at discharge who were matched with four controls based on operative types. Multivariable conditional logistic regression was used to identified the factors associated with pTTS. The area under the curve (AUC) was used to evaluate the diagnostic efficacy. Results Among the 128ā€‰536 patients underwent nonā€cardiac surgery, 20 patients with pTTS and 80 patients without were enrolled in this study. The incidence of pTTS was about 0.016% in our centre. The median age of patients with pTTS was 52.5 (38.25, 76.25) years, although 90% of them were female. Fifty per cent (9 cases) of female patients were preā€menopausal. Caesarean section has the highest proportion of pTTS (30% of the pTTS cases) with the incidence of caesarean sectionā€related pTTS of 0.06% in our centre. A high prevalence of nonā€apical ballooning pattern of regional wall motion abnormality (seven cases, 35%) and a high mortality (two cases, 10%) were observed. Left ventricular ejection fraction (LVEF) of patients with pTTS was significantly decreased (41.7Ā Ā±Ā 8.8%). In the acute phase, supportive treatments aiming to reduce lifeā€threatening complications were main treatment strategies. After receiving systematic treatment, significant improvements were observed in LVEF (63.1Ā Ā±Ā 13.5%), with median recovery time of LVEF of 7.48Ā days. Leucocyte count [odds ratio (OR): 4.59; 95% confidence interval (CI): 1.10ā€“19.15], haemoglobin (HGB) (OR: 10.52; 95% CI: 1.04ā€“106.36), and the revised cardiac risk index (RCRI) score (OR: 6.30; 95% CI: 1.05ā€“37.88) were the factors significantly associated with pTTS. The RCRI score performed poorly in the prediction of pTTS (AUC: 0.630; 95% CI: 0.525ā€“0.735). After adding leucocyte count and HGB into the RCRI score, the AUC was significantly improved (AUC: 0.768; 95% CI: 0.671ā€“0.865; PĀ =Ā 0.001). Conclusions Patients with pTTS have some differences compared with common TTS, including higher proportion of preā€menopausal female, higher prevalence during caesarean section, higher prevalence of nonā€apical ballooning pattern of regional wall motion abnormality, and higher mortality. The RCRI score performed poorly in the evaluation of pTTS. Adding HGB and leucocyte count into the RCRI score could significantly improve its predictive performance
    corecore