135 research outputs found
DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Camera-based 3D object detectors are welcome due to their wider deployment
and lower price than LiDAR sensors. We revisit the prior stereo modeling DSGN
about the stereo volume constructions for representing both 3D geometry and
semantics. We polish the stereo modeling and propose our approach, DSGN++,
aiming for improving information flow throughout the 2D-to-3D pipeline in the
following three main aspects. First, to effectively lift the 2D information to
stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser
connections and extracts depth-guided features. Second, for better grasping
differently spaced features, we present a novel stereo volume -- Dual-view
Stereo Volume (DSV) that integrates front-view and top-view features and
reconstructs sub-voxel depth in the camera frustum. Third, as the foreground
region becomes less dominant in 3D space, we firstly propose a multi-modal data
editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal
alignment and improves data efficiency. Without bells and whistles, extensive
experiments in various modality setups on the popular KITTI benchmark show that
our method consistently outperforms other camera-based 3D detectors for all
categories. Code will be released at https://github.com/chenyilun95/DSGN2
Towards Learning a Generalist Model for Embodied Navigation
Building a generalist agent that can interact with the world is the
intriguing target of AI systems, thus spurring the research for embodied
navigation, where an agent is required to navigate according to instructions or
respond to queries. Despite the major progress attained, previous works
primarily focus on task-specific agents and lack generalizability to unseen
scenarios. Recently, LLMs have presented remarkable capabilities across various
fields, and provided a promising opportunity for embodied navigation. Drawing
on this, we propose the first generalist model for embodied navigation,
NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based
instruction. The schema-based instruction flexibly casts various tasks into
generation problems, thereby unifying a wide range of tasks. This approach
allows us to integrate diverse data sources from various datasets into the
training, equipping NaviLLM with a wide range of capabilities required by
embodied navigation. We conduct extensive experiments to evaluate the
performance and generalizability of our model. The experimental results
demonstrate that our unified model achieves state-of-the-art performance on
CVDN, SOON, and ScanQA. Specifically, it surpasses the previous
stats-of-the-art method by a significant margin of 29% in goal progress on
CVDN. Moreover, our model also demonstrates strong generalizability and
presents impressive results on unseen tasks, e.g., embodied question answering
and 3D captioning.Comment: Accepted by CVPR 2024 (14 pages, 3 figures
LITA: Language Instructed Temporal-Localization Assistant
There has been tremendous progress in multimodal Large Language Models
(LLMs). Recent works have extended these models to video input with promising
instruction following capabilities. However, an important missing piece is
temporal localization. These models cannot accurately answer the "When?"
questions. We identify three key aspects that limit their temporal localization
capabilities: (i) time representation, (ii) architecture, and (iii) data. We
address these shortcomings by proposing Language Instructed
Temporal-Localization Assistant (LITA) with the following features: (1) We
introduce time tokens that encode timestamps relative to the video length to
better represent time in videos. (2) We introduce SlowFast tokens in the
architecture to capture temporal information at fine temporal resolution. (3)
We emphasize temporal localization data for LITA. In addition to leveraging
existing video datasets with timestamps, we propose a new task, Reasoning
Temporal Localization (RTL), along with the dataset, ActivityNet-RTL, for
learning and evaluating this task. Reasoning temporal localization requires
both the reasoning and temporal localization of Video LLMs. LITA demonstrates
strong performance on this challenging task, nearly doubling the temporal mean
intersection-over-union (mIoU) of baselines. In addition, we show that our
emphasis on temporal localization also substantially improves video-based text
generation compared to existing Video LLMs, including a 36% relative
improvement of Temporal Understanding. Code is available at:
https://github.com/NVlabs/LIT
CLEVA: Chinese Language Models EVAluation Platform
With the continuous emergence of Chinese Large Language Models (LLMs), how to
evaluate a model's capabilities has become an increasingly significant issue.
The absence of a comprehensive Chinese benchmark that thoroughly assesses a
model's performance, the unstandardized and incomparable prompting procedure,
and the prevalent risk of contamination pose major challenges in the current
evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted
to holistically evaluate Chinese LLMs. Our platform employs a standardized
workflow to assess LLMs' performance across various dimensions, regularly
updating a competitive leaderboard. To alleviate contamination, CLEVA curates a
significant proportion of new data and develops a sampling strategy that
guarantees a unique subset for each leaderboard round. Empowered by an
easy-to-use interface that requires just a few mouse clicks and a model API,
users can conduct a thorough evaluation with minimal coding. Large-scale
experiments featuring 23 Chinese LLMs have validated CLEVA's efficacy.Comment: EMNLP 2023 System Demonstrations camera-read
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
With the recent significant advancements in large multi-modal models (LMMs),
the importance of their grounding capability in visual chat is increasingly
recognized. Despite recent efforts to enable LMMs to support grounding, their
capabilities for grounding and chat are usually separate, and their chat
performance drops dramatically when asked to ground. The problem is the lack of
a dataset for grounded visual chat (GVC). Existing grounding datasets only
contain short captions. To address this issue, we have created GVC data that
allows for the combination of grounding and chat capabilities. To better
evaluate the GVC capabilities, we have introduced a benchmark called
Grounding-Bench. Additionally, we have proposed a model design that can support
GVC and various types of visual prompts by connecting segmentation models with
language models. Experimental results demonstrate that our model outperforms
other LMMs on Grounding-Bench. Furthermore, our model achieves competitive
performance on classic grounding benchmarks like RefCOCO/+/g and Flickr30K
Entities. Our code will be released at
https://github.com/UX-Decoder/LLaVA-Grounding
Causal association between self-reported fatigue and coronary artery disease: a bidirectional two-sample Mendelian randomization analysis
BackgroundObservational studies have reported the association between fatigue and coronary artery disease (CAD), but the causal association between fatigue and CAD is unclear.MethodWe conducted a bidirectional Mendelian randomization (MR) study using publicly available genome-wide association studies (GWAS) data. The inverse-variance weighted (IVW) method was used as the primary analysis. We performed three complementary methods, including weighted median, MR-Egger regression, and MR pleiotropy residual sum and outlier (MR-PRESSO) to evaluate the sensitivity and horizontal pleiotropy of the results.ResultSelf-reported fatigue had a causal effect on coronary artery atherosclerosis (CAA) (OR 1.047, 95%CI 1.033–1.062), myocardial infarction (MI) (OR 1.027 95%CI 1.014–1.039) and coronary heart disease (CHD) (OR 1.037, 95%CI 1.021–1.053). We did not find a significant reverse causality between self-reported fatigue and CAD. Given the heterogeneity revealed by MR-Egger regression, we employed the IVW random effect model. For the examination of fatigue on CHD and the reverse analysis of CAA, and MI on fatigue, the MR-PRESSO test found horizontal pleiotropy. No significant outliers were found.ConclusionThe MR analysis reveals a causal relationship between self-reported fatigue and CAD. The results should be interpreted with caution due to horizontal pleiotropy
Transient ischemic attack and coronary artery disease: a two-sample Mendelian randomization analysis
BackgroundAlthough observational studies have shown that patients who experienced transient ischemic attacks (TIAs) had a higher risk of coronary artery disease (CAD), the causal relationship is ambiguous.MethodsWe conducted a two-sample Mendelian randomization (MR) study to analyze the causal relationship between TIA and CAD using data from the FinnGen genome-wide association study. Analysis was performed using the inverse-variance weighted (IVW) method. The robustness of the results was evaluated using MR-Egger regression, the weighted median, MR pleiotropy residual sum, and outlier (MR-PRESSO) and multivariable MR analysis.ResultsResults from IVW random-effect model showed that TIA was associated with an increased risk of coronary artery atherosclerosis (OR 1.17, 95% CI 1.06–1.28, P = 0.002), ischemic heart disease (OR 1.15, 95% CI 1.04–1.27, P = 0.007), and myocardial infarction (OR1.15, 95% CI 1.02–1.29, P = 0.025). In addition, heterogeneity and horizontal pleiotropy were observed in the ischemic heart disease results, while the sensitivity analysis revealed no evidence of horizontal pleiotropy in other outcomes.ConclusionsThis MR study demonstrated a potential causal relationship between TIA and CAD. Further research should be conducted to investigate the mechanism underlying the association
Metagenomic surveillance and comparative genomic analysis of Chlamydia psittaci in patients with pneumonia
Chlamydia psittaci, a strictly intracellular bacterium, is an underestimated etiologic agent leading to infections in a broad range of animals and mild illness or pneumonia in humans. In this study, the metagenomes of bronchoalveolar lavage fluids from the patients with pneumonia were sequenced and highly abundant C. psittaci was found. The target-enriched metagenomic reads were recruited to reconstruct draft genomes with more than 99% completeness. Two C. psittaci strains from novel sequence types were detected and these were closely related to the animal-borne isolates derived from the lineages of ST43 and ST28, indicating the zoonotic transmissions of C. psittaci would benefit its prevalence worldwide. Comparative genomic analysis combined with public isolate genomes revealed that the pan-genome of C. psittaci possessed a more stable gene repertoire than those of other extracellular bacteria, with ~90% of the genes per genome being conserved core genes. Furthermore, the evidence for significantly positive selection was identified in 20 virulence-associated gene products, particularly bacterial membrane-embedded proteins and type three secretion machines, which may play important roles in the pathogen-host interactions. This survey uncovered novel strains of C. psittaci causing pneumonia and the evolutionary analysis characterized prominent gene candidates involved in bacterial adaptation to immune pressures. The metagenomic approach is of significance to the surveillance of difficult-to-culture intracellular pathogens and the research into molecular epidemiology and evolutionary biology of C. psittaci
- …