135 research outputs found

    DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

    Full text link
    Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN about the stereo volume constructions for representing both 3D geometry and semantics. We polish the stereo modeling and propose our approach, DSGN++, aiming for improving information flow throughout the 2D-to-3D pipeline in the following three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for better grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we firstly propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code will be released at https://github.com/chenyilun95/DSGN2

    Towards Learning a Generalist Model for Embodied Navigation

    Full text link
    Building a generalist agent that can interact with the world is the intriguing target of AI systems, thus spurring the research for embodied navigation, where an agent is required to navigate according to instructions or respond to queries. Despite the major progress attained, previous works primarily focus on task-specific agents and lack generalizability to unseen scenarios. Recently, LLMs have presented remarkable capabilities across various fields, and provided a promising opportunity for embodied navigation. Drawing on this, we propose the first generalist model for embodied navigation, NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based instruction. The schema-based instruction flexibly casts various tasks into generation problems, thereby unifying a wide range of tasks. This approach allows us to integrate diverse data sources from various datasets into the training, equipping NaviLLM with a wide range of capabilities required by embodied navigation. We conduct extensive experiments to evaluate the performance and generalizability of our model. The experimental results demonstrate that our unified model achieves state-of-the-art performance on CVDN, SOON, and ScanQA. Specifically, it surpasses the previous stats-of-the-art method by a significant margin of 29% in goal progress on CVDN. Moreover, our model also demonstrates strong generalizability and presents impressive results on unseen tasks, e.g., embodied question answering and 3D captioning.Comment: Accepted by CVPR 2024 (14 pages, 3 figures

    LITA: Language Instructed Temporal-Localization Assistant

    Full text link
    There has been tremendous progress in multimodal Large Language Models (LLMs). Recent works have extended these models to video input with promising instruction following capabilities. However, an important missing piece is temporal localization. These models cannot accurately answer the "When?" questions. We identify three key aspects that limit their temporal localization capabilities: (i) time representation, (ii) architecture, and (iii) data. We address these shortcomings by proposing Language Instructed Temporal-Localization Assistant (LITA) with the following features: (1) We introduce time tokens that encode timestamps relative to the video length to better represent time in videos. (2) We introduce SlowFast tokens in the architecture to capture temporal information at fine temporal resolution. (3) We emphasize temporal localization data for LITA. In addition to leveraging existing video datasets with timestamps, we propose a new task, Reasoning Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for learning and evaluating this task. Reasoning temporal localization requires both the reasoning and temporal localization of Video LLMs. LITA demonstrates strong performance on this challenging task, nearly doubling the temporal mean intersection-over-union (mIoU) of baselines. In addition, we show that our emphasis on temporal localization also substantially improves video-based text generation compared to existing Video LLMs, including a 36% relative improvement of Temporal Understanding. Code is available at: https://github.com/NVlabs/LIT

    CLEVA: Chinese Language Models EVAluation Platform

    Full text link
    With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 Chinese LLMs have validated CLEVA's efficacy.Comment: EMNLP 2023 System Demonstrations camera-read

    LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

    Full text link
    With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for grounding and chat are usually separate, and their chat performance drops dramatically when asked to ground. The problem is the lack of a dataset for grounded visual chat (GVC). Existing grounding datasets only contain short captions. To address this issue, we have created GVC data that allows for the combination of grounding and chat capabilities. To better evaluate the GVC capabilities, we have introduced a benchmark called Grounding-Bench. Additionally, we have proposed a model design that can support GVC and various types of visual prompts by connecting segmentation models with language models. Experimental results demonstrate that our model outperforms other LMMs on Grounding-Bench. Furthermore, our model achieves competitive performance on classic grounding benchmarks like RefCOCO/+/g and Flickr30K Entities. Our code will be released at https://github.com/UX-Decoder/LLaVA-Grounding

    Causal association between self-reported fatigue and coronary artery disease: a bidirectional two-sample Mendelian randomization analysis

    Get PDF
    BackgroundObservational studies have reported the association between fatigue and coronary artery disease (CAD), but the causal association between fatigue and CAD is unclear.MethodWe conducted a bidirectional Mendelian randomization (MR) study using publicly available genome-wide association studies (GWAS) data. The inverse-variance weighted (IVW) method was used as the primary analysis. We performed three complementary methods, including weighted median, MR-Egger regression, and MR pleiotropy residual sum and outlier (MR-PRESSO) to evaluate the sensitivity and horizontal pleiotropy of the results.ResultSelf-reported fatigue had a causal effect on coronary artery atherosclerosis (CAA) (OR 1.047, 95%CI 1.033–1.062), myocardial infarction (MI) (OR 1.027 95%CI 1.014–1.039) and coronary heart disease (CHD) (OR 1.037, 95%CI 1.021–1.053). We did not find a significant reverse causality between self-reported fatigue and CAD. Given the heterogeneity revealed by MR-Egger regression, we employed the IVW random effect model. For the examination of fatigue on CHD and the reverse analysis of CAA, and MI on fatigue, the MR-PRESSO test found horizontal pleiotropy. No significant outliers were found.ConclusionThe MR analysis reveals a causal relationship between self-reported fatigue and CAD. The results should be interpreted with caution due to horizontal pleiotropy

    Transient ischemic attack and coronary artery disease: a two-sample Mendelian randomization analysis

    Get PDF
    BackgroundAlthough observational studies have shown that patients who experienced transient ischemic attacks (TIAs) had a higher risk of coronary artery disease (CAD), the causal relationship is ambiguous.MethodsWe conducted a two-sample Mendelian randomization (MR) study to analyze the causal relationship between TIA and CAD using data from the FinnGen genome-wide association study. Analysis was performed using the inverse-variance weighted (IVW) method. The robustness of the results was evaluated using MR-Egger regression, the weighted median, MR pleiotropy residual sum, and outlier (MR-PRESSO) and multivariable MR analysis.ResultsResults from IVW random-effect model showed that TIA was associated with an increased risk of coronary artery atherosclerosis (OR 1.17, 95% CI 1.06–1.28, P = 0.002), ischemic heart disease (OR 1.15, 95% CI 1.04–1.27, P = 0.007), and myocardial infarction (OR1.15, 95% CI 1.02–1.29, P = 0.025). In addition, heterogeneity and horizontal pleiotropy were observed in the ischemic heart disease results, while the sensitivity analysis revealed no evidence of horizontal pleiotropy in other outcomes.ConclusionsThis MR study demonstrated a potential causal relationship between TIA and CAD. Further research should be conducted to investigate the mechanism underlying the association

    Metagenomic surveillance and comparative genomic analysis of Chlamydia psittaci in patients with pneumonia

    Get PDF
    Chlamydia psittaci, a strictly intracellular bacterium, is an underestimated etiologic agent leading to infections in a broad range of animals and mild illness or pneumonia in humans. In this study, the metagenomes of bronchoalveolar lavage fluids from the patients with pneumonia were sequenced and highly abundant C. psittaci was found. The target-enriched metagenomic reads were recruited to reconstruct draft genomes with more than 99% completeness. Two C. psittaci strains from novel sequence types were detected and these were closely related to the animal-borne isolates derived from the lineages of ST43 and ST28, indicating the zoonotic transmissions of C. psittaci would benefit its prevalence worldwide. Comparative genomic analysis combined with public isolate genomes revealed that the pan-genome of C. psittaci possessed a more stable gene repertoire than those of other extracellular bacteria, with ~90% of the genes per genome being conserved core genes. Furthermore, the evidence for significantly positive selection was identified in 20 virulence-associated gene products, particularly bacterial membrane-embedded proteins and type three secretion machines, which may play important roles in the pathogen-host interactions. This survey uncovered novel strains of C. psittaci causing pneumonia and the evolutionary analysis characterized prominent gene candidates involved in bacterial adaptation to immune pressures. The metagenomic approach is of significance to the surveillance of difficult-to-culture intracellular pathogens and the research into molecular epidemiology and evolutionary biology of C. psittaci
    • …
    corecore