23 research outputs found
Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions
Visual language navigation (VLN) is an embodied task demanding a wide range
of skills encompassing understanding, perception, and planning. For such a
multifaceted challenge, previous VLN methods totally rely on one model's own
thinking to make predictions within one round. However, existing models, even
the most advanced large language model GPT4, still struggle with dealing with
multiple tasks by single-round self-thinking. In this work, drawing inspiration
from the expert consultation meeting, we introduce a novel zero-shot VLN
framework. Within this framework, large models possessing distinct abilities
are served as domain experts. Our proposed navigation agent, namely DiscussNav,
can actively discuss with these experts to collect essential information before
moving at every step. These discussions cover critical navigation subtasks like
instruction understanding, environment perception, and completion estimation.
Through comprehensive experiments, we demonstrate that discussions with domain
experts can effectively facilitate navigation by perceiving
instruction-relevant information, correcting inadvertent errors, and sifting
through in-consistent movement decisions. The performances on the
representative VLN task R2R show that our method surpasses the leading
zero-shot VLN model by a large margin on all metrics. Additionally, real-robot
experiments display the obvious advantages of our method over single-round
self-thinking.Comment: Submitted to ICRA 202
Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark
Existing multimodal task-oriented dialog data fails to demonstrate the
diverse expressions of user subjective preferences and recommendation acts in
the real-life shopping scenario. This paper introduces a new dataset SURE
(Multimodal Recommendation Dialog with SUbjective PREference), which contains
12K shopping dialogs in complex store scenes. The data is built in two phases
with human annotations to ensure quality and diversity. SURE is well-annotated
with subjective preferences and recommendation acts proposed by sales experts.
A comprehensive analysis is given to reveal the distinguishing features of
SURE. Three benchmark tasks are then proposed on the data to evaluate the
capability of multimodal recommendation agents. Based on the SURE, we propose a
baseline model, powered by a state-of-the-art multimodal model, for these
tasks.Comment: ACL 202
EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device
Augmented Reality (AR) has been used to facilitate surgical guidance during
External Ventricular Drain (EVD) surgery, reducing the risks of misplacement in
manual operations. During this procedure, the pivotal challenge is the accurate
estimation of spatial relationship between pre-operative images and actual
patient anatomy in AR environment. In this research, we propose a novel
framework utilizing Time of Flight (ToF) depth sensors integrated in
commercially available AR Head Mounted Devices (HMD) for precise EVD surgical
guidance. As previous studies have proven depth errors for ToF sensors, we
first conducted a comprehensive assessment for the properties of this error on
AR-HMDs. Subsequently, a depth error model and patient-specific model parameter
identification method, is introduced for accurate surface information. After
that, a tracking procedure combining retro-reflective markers and point clouds
is proposed for accurate head tracking, where head surface is reconstructed
using ToF sensor data for spatial registration, avoiding fixing tracking
targets rigidly on the patient's cranium. Firstly, ToF
sensor depth value error was revealed on human skin, indicating the
significance of depth correction. Our results showed that the ToF sensor depth
error was reduced by over using proposed depth correction method on head
phantoms in different materials. Meanwhile, the head surface reconstructed with
corrected depth data achieved sub-millimeter accuracy. Experiment on a sheep
head revealed reconstruction error. Furthermore, a user study was
conducted for the performance of proposed framework in simulated EVD surgery,
where 5 surgeons performed 9 k-wire injections on a head phantom with virtual
guidance. Results of this study revealed translational
accuracy and orientational accuracy
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Visually-grounded dialog systems, which integrate multiple modes of
communication such as text and visual inputs, have become an increasingly
popular area of investigation. However, the absence of a standardized
evaluation framework poses a challenge in assessing the development of this
field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded
\textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It
defines five core multi-modal dialogue tasks and covers six datasets.
Furthermore, in order to provide a comprehensive assessment of the model's
performance across all tasks, we developed a novel evaluation metric called
VDscore, which is based on the Analytic Hierarchy Process~(AHP) method.
Additionally, we present a straightforward yet efficient baseline model, named
\textbf{VISIT}~(\textbf{VIS}ually-grounded d\textbf{I}alog
\textbf{T}ransformer), to promote the advancement of general multi-modal
dialogue systems. It progressively builds its multi-modal foundation and
dialogue capability via a two-stage pre-training strategy.
We believe that the VDialogUE benchmark, along with the evaluation scripts
and our baseline models, will accelerate the development of visually-grounded
dialog systems and lead to the development of more sophisticated and effective
pre-trained models
A Novel Linear Spectrum Frequency Feature Extraction Technique for Warship Radio Noise Based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, Duffing Chaotic Oscillator, and Weighted-Permutation Entropy
Warships play an important role in the modern sea battlefield. Research on the line spectrum features of warship radio noise signals is helpful to realize the classification and recognition of different types of warships, and provides critical information for sea battlefield. In this paper, we proposed a novel linear spectrum frequency feature extraction technique for warship radio noise based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), duffing chaotic oscillator (DCO), and weighted-permutation entropy (W-PE). The proposed linear spectrum frequency feature extraction technique, named CEEMDAN-DCO-W-PE has the following advantages in comparison with other linear spectrum frequency feature extraction techniques; (i) as an adaptive data-driven algorithm, CEEMDAN has more accurate and more reliable decomposition performance than empirical mode decomposition (EMD) and ensemble EMD (EEMD), and there is no need for presetting parameters, such as decomposition level and basis function; (ii) DCO can detect the linear spectrum of narrow band periodical warship signals by way of utilizing its properties of sensitivity for weak periodical signals and the immunity for noise; and (iii) W-PE is used in underwater acoustic signal feature extraction for the first time, and compared with traditional permutation entropy (PE), W-PE increases amplitude information to some extent. Firstly, warship radio noise signals are decomposed into some intrinsic mode functions (IMFs) from high frequency to low frequency by CEEMDAN. Then, DCO is used to detect linear spectrum of low-frequency IMFs. Finally, we can determine the linear spectrum frequency of low-frequency IMFs using W-PE. The experimental results show that the proposed technique can accurately extract the line spectrum frequency of the simulation signals, and has a higher classification and recognition rate than the traditional techniques for real warship radio noise signals
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relations and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Increment Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets. We release our code and data at https://github.com/LYX0501/SPRING
Alterations of Hematologic and Hematopoietic Parameters in Mice Exposed to Pulsed Electromagnetic Field
Effects of pulsed electromagnetic field (PEMF) on hematology and hematopoiesis might vary with different PEMF parameters. The purpose of this study was to evaluate the possible effects of PEMF exposure at different pulses on hematologic and hematopoietic parameters in mice. Groups of male BALB/c mice were whole body exposed or were sham exposed (control) to PEMF at 100, 1000, and 10000 pulses. After PEMF exposure, blood samples and bone marrow cells of mice were collected for hematologic examinations, bone marrow nucleated cell counting, colony-forming units of granulocyte-macrophage (CFU-GM) colony assay, and serum granulocyte-macrophage colony-stimulating factor (GM-CSF) assay. Compared with the control group, white blood cells (WBC) and lymphocytes (LYM) in the 100 and 1000 pulses exposed groups were significantly increased but not changed in the 10000 pulses exposed group. Red blood cells (RBC), hemoglobin (HGB), and platelets (PLT) were not changed in all exposed groups. There was no significant difference in mouse bone marrow nucleated cell number between the control group and each exposed group 7 days after PEMF exposure. The CFU-GM clone number of bone marrow cells and serum GM-CSF level were significantly increased in the 100 and 1000 pulses exposed group but not changed in the 10000 pulses exposed group. Our results indicated that the PEMF exposure at fewer pulses may induce statistically significant alterations in some hematologic and hematopoietic parameters of mice but no changes can be found in the more pulses PEMF-exposed groups
Acute thiamethoxam exposure induces hepatotoxicity and neurotoxicity in juvenile Chinese mitten crab (Eriocheir sinensis)
The similar nervous system structure between crustaceans and insects and the high-water solubility of thiamethoxam can lead to the more severe toxicity of thiamethoxam to crustaceans. However, the effects of thiamethoxam on crustaceans are unclear. Therefore, a 96-h acute toxicity test was performed to explore the hepatotoxicity and neurotoxicity effects of thiamethoxam on Chinese mitten crab (Eriocheir sinensis) at concentrations 0 µg/L, 150 µg/L and 300 µg/L. The antioxidant and detoxification systems (including phases I and II) were significantly activated after exposure of juvenile crabs to thiamethoxam for 24 h in 300 µg/L group, whereas the toxic activation effect in 150 μg/L group was delayed. Moreover, a similar pattern was observed for the transcription levels of immune-related genes. Further analysis of inflammatory signaling pathway-related genes showed that thiamethoxam exposure with 300 µg/L for 24 h may induce a pro-inflammatory response through the NF-κB pathway. In contrast, the gene expression levels in 150 µg/L group were significantly upregulated compared with 0 µg/L group after 96 h. In addition, although the acute exposure of 150 μg/L thiamethoxam did not seem to induce significant neurotoxicity, the acetylcholinesterase activity was significantly decreased in 300 μg/L group after thiamethoxam exposure for 96 h. Correspondingly, thiamethoxam exposure with 300 µg/L for 24 h resulted in significantly downregulated transcriptional levels of synaptic transmission-related genes (e.g. dopamine-, gamma-aminobutyric acid- and serotonin-related receptors). Therefore, thiamethoxam may be harmful and cause potential toxic threats such as neurotoxicity and metabolic damage to crustaceans