23 research outputs found

    Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

    Full text link
    Visual language navigation (VLN) is an embodied task demanding a wide range of skills encompassing understanding, perception, and planning. For such a multifaceted challenge, previous VLN methods totally rely on one model's own thinking to make predictions within one round. However, existing models, even the most advanced large language model GPT4, still struggle with dealing with multiple tasks by single-round self-thinking. In this work, drawing inspiration from the expert consultation meeting, we introduce a novel zero-shot VLN framework. Within this framework, large models possessing distinct abilities are served as domain experts. Our proposed navigation agent, namely DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks like instruction understanding, environment perception, and completion estimation. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and sifting through in-consistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.Comment: Submitted to ICRA 202

    Multimodal Recommendation Dialog with Subjective Preference: A New Challenge and Benchmark

    Full text link
    Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shopping scenario. This paper introduces a new dataset SURE (Multimodal Recommendation Dialog with SUbjective PREference), which contains 12K shopping dialogs in complex store scenes. The data is built in two phases with human annotations to ensure quality and diversity. SURE is well-annotated with subjective preferences and recommendation acts proposed by sales experts. A comprehensive analysis is given to reveal the distinguishing features of SURE. Three benchmark tasks are then proposed on the data to evaluate the capability of multimodal recommendation agents. Based on the SURE, we propose a baseline model, powered by a state-of-the-art multimodal model, for these tasks.Comment: ACL 202

    EVD Surgical Guidance with Retro-Reflective Tool Tracking and Spatial Reconstruction using Head-Mounted Augmented Reality Device

    Full text link
    Augmented Reality (AR) has been used to facilitate surgical guidance during External Ventricular Drain (EVD) surgery, reducing the risks of misplacement in manual operations. During this procedure, the pivotal challenge is the accurate estimation of spatial relationship between pre-operative images and actual patient anatomy in AR environment. In this research, we propose a novel framework utilizing Time of Flight (ToF) depth sensors integrated in commercially available AR Head Mounted Devices (HMD) for precise EVD surgical guidance. As previous studies have proven depth errors for ToF sensors, we first conducted a comprehensive assessment for the properties of this error on AR-HMDs. Subsequently, a depth error model and patient-specific model parameter identification method, is introduced for accurate surface information. After that, a tracking procedure combining retro-reflective markers and point clouds is proposed for accurate head tracking, where head surface is reconstructed using ToF sensor data for spatial registration, avoiding fixing tracking targets rigidly on the patient's cranium. Firstly, 7.580±1.488mm7.580\pm 1.488 mm ToF sensor depth value error was revealed on human skin, indicating the significance of depth correction. Our results showed that the ToF sensor depth error was reduced by over 85%85\% using proposed depth correction method on head phantoms in different materials. Meanwhile, the head surface reconstructed with corrected depth data achieved sub-millimeter accuracy. Experiment on a sheep head revealed 0.79mm0.79 mm reconstruction error. Furthermore, a user study was conducted for the performance of proposed framework in simulated EVD surgery, where 5 surgeons performed 9 k-wire injections on a head phantom with virtual guidance. Results of this study revealed 2.09±0.16mm2.09 \pm 0.16 mm translational accuracy and 2.97±0.91∘2.97\pm 0.91 ^\circ orientational accuracy

    VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

    Full text link
    Visually-grounded dialog systems, which integrate multiple modes of communication such as text and visual inputs, have become an increasingly popular area of investigation. However, the absence of a standardized evaluation framework poses a challenge in assessing the development of this field. To this end, we propose \textbf{VDialogUE}, a \textbf{V}isually-grounded \textbf{Dialog}ue benchmark for \textbf{U}nified \textbf{E}valuation. It defines five core multi-modal dialogue tasks and covers six datasets. Furthermore, in order to provide a comprehensive assessment of the model's performance across all tasks, we developed a novel evaluation metric called VDscore, which is based on the Analytic Hierarchy Process~(AHP) method. Additionally, we present a straightforward yet efficient baseline model, named \textbf{VISIT}~(\textbf{VIS}ually-grounded d\textbf{I}alog \textbf{T}ransformer), to promote the advancement of general multi-modal dialogue systems. It progressively builds its multi-modal foundation and dialogue capability via a two-stage pre-training strategy. We believe that the VDialogUE benchmark, along with the evaluation scripts and our baseline models, will accelerate the development of visually-grounded dialog systems and lead to the development of more sophisticated and effective pre-trained models

    A Novel Linear Spectrum Frequency Feature Extraction Technique for Warship Radio Noise Based on Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, Duffing Chaotic Oscillator, and Weighted-Permutation Entropy

    No full text
    Warships play an important role in the modern sea battlefield. Research on the line spectrum features of warship radio noise signals is helpful to realize the classification and recognition of different types of warships, and provides critical information for sea battlefield. In this paper, we proposed a novel linear spectrum frequency feature extraction technique for warship radio noise based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), duffing chaotic oscillator (DCO), and weighted-permutation entropy (W-PE). The proposed linear spectrum frequency feature extraction technique, named CEEMDAN-DCO-W-PE has the following advantages in comparison with other linear spectrum frequency feature extraction techniques; (i) as an adaptive data-driven algorithm, CEEMDAN has more accurate and more reliable decomposition performance than empirical mode decomposition (EMD) and ensemble EMD (EEMD), and there is no need for presetting parameters, such as decomposition level and basis function; (ii) DCO can detect the linear spectrum of narrow band periodical warship signals by way of utilizing its properties of sensitivity for weak periodical signals and the immunity for noise; and (iii) W-PE is used in underwater acoustic signal feature extraction for the first time, and compared with traditional permutation entropy (PE), W-PE increases amplitude information to some extent. Firstly, warship radio noise signals are decomposed into some intrinsic mode functions (IMFs) from high frequency to low frequency by CEEMDAN. Then, DCO is used to detect linear spectrum of low-frequency IMFs. Finally, we can determine the linear spectrum frequency of low-frequency IMFs using W-PE. The experimental results show that the proposed technique can accurately extract the line spectrum frequency of the simulation signals, and has a higher classification and recognition rate than the traditional techniques for real warship radio noise signals

    SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph

    No full text
    Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality. In this paper, we propose a Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph (SPRING) with abilities of reasoning multi-hops spatial relations and connecting them with visual attributes in crowded situated scenarios. Specifically, we design two types of Multimodal Question Answering (MQA) tasks to pretrain the agent. All QA pairs utilized during pretraining are generated from novel Increment Layout Graphs (ILG). QA pair difficulty labels automatically annotated by ILG are used to promote MQA-based Curriculum Learning. Experimental results verify the SPRING's effectiveness, showing that it significantly outperforms state-of-the-art approaches on both SIMMC 1.0 and SIMMC 2.0 datasets. We release our code and data at https://github.com/LYX0501/SPRING

    Alterations of Hematologic and Hematopoietic Parameters in Mice Exposed to Pulsed Electromagnetic Field

    No full text
    Effects of pulsed electromagnetic field (PEMF) on hematology and hematopoiesis might vary with different PEMF parameters. The purpose of this study was to evaluate the possible effects of PEMF exposure at different pulses on hematologic and hematopoietic parameters in mice. Groups of male BALB/c mice were whole body exposed or were sham exposed (control) to PEMF at 100, 1000, and 10000 pulses. After PEMF exposure, blood samples and bone marrow cells of mice were collected for hematologic examinations, bone marrow nucleated cell counting, colony-forming units of granulocyte-macrophage (CFU-GM) colony assay, and serum granulocyte-macrophage colony-stimulating factor (GM-CSF) assay. Compared with the control group, white blood cells (WBC) and lymphocytes (LYM) in the 100 and 1000 pulses exposed groups were significantly increased but not changed in the 10000 pulses exposed group. Red blood cells (RBC), hemoglobin (HGB), and platelets (PLT) were not changed in all exposed groups. There was no significant difference in mouse bone marrow nucleated cell number between the control group and each exposed group 7 days after PEMF exposure. The CFU-GM clone number of bone marrow cells and serum GM-CSF level were significantly increased in the 100 and 1000 pulses exposed group but not changed in the 10000 pulses exposed group. Our results indicated that the PEMF exposure at fewer pulses may induce statistically significant alterations in some hematologic and hematopoietic parameters of mice but no changes can be found in the more pulses PEMF-exposed groups

    Acute thiamethoxam exposure induces hepatotoxicity and neurotoxicity in juvenile Chinese mitten crab (Eriocheir sinensis)

    No full text
    The similar nervous system structure between crustaceans and insects and the high-water solubility of thiamethoxam can lead to the more severe toxicity of thiamethoxam to crustaceans. However, the effects of thiamethoxam on crustaceans are unclear. Therefore, a 96-h acute toxicity test was performed to explore the hepatotoxicity and neurotoxicity effects of thiamethoxam on Chinese mitten crab (Eriocheir sinensis) at concentrations 0 µg/L, 150 µg/L and 300 µg/L. The antioxidant and detoxification systems (including phases I and II) were significantly activated after exposure of juvenile crabs to thiamethoxam for 24 h in 300 µg/L group, whereas the toxic activation effect in 150 μg/L group was delayed. Moreover, a similar pattern was observed for the transcription levels of immune-related genes. Further analysis of inflammatory signaling pathway-related genes showed that thiamethoxam exposure with 300 µg/L for 24 h may induce a pro-inflammatory response through the NF-κB pathway. In contrast, the gene expression levels in 150 µg/L group were significantly upregulated compared with 0 µg/L group after 96 h. In addition, although the acute exposure of 150 μg/L thiamethoxam did not seem to induce significant neurotoxicity, the acetylcholinesterase activity was significantly decreased in 300 μg/L group after thiamethoxam exposure for 96 h. Correspondingly, thiamethoxam exposure with 300 µg/L for 24 h resulted in significantly downregulated transcriptional levels of synaptic transmission-related genes (e.g. dopamine-, gamma-aminobutyric acid- and serotonin-related receptors). Therefore, thiamethoxam may be harmful and cause potential toxic threats such as neurotoxicity and metabolic damage to crustaceans
    corecore