134 research outputs found

    Show, Recall, and Tell: Image Captioning with Recall Mechanism

    Full text link
    Generating natural and accurate descriptions in image cap-tioning has always been a challenge. In this paper, we pro-pose a novel recall mechanism to imitate the way human con-duct captioning. There are three parts in our recall mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS). Recall unit is a text-retrieval module designedto retrieve recalled words for images. SG and RWS are de-signed for the best use of recalled words. SG branch cangenerate a recalled context, which can guide the process ofgenerating caption. RWS branch is responsible for copyingrecalled words to the caption. Inspired by pointing mecha-nism in text summarization, we adopt a soft switch to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr optimization step, we also introduce an individualrecalled-word reward (WR) to boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr / SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 / 22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the results of other state-of-the-artmethods.Comment: Published in AAAI 202

    Scaling Team Coordination on Graphs with Reinforcement Learning

    Full text link
    This paper studies Reinforcement Learning (RL) techniques to enable team coordination behaviors in graph environments with support actions among teammates to reduce the costs of traversing certain risky edges in a centralized manner. While classical approaches can solve this non-standard multi-agent path planning problem by converting the original Environment Graph (EG) into a Joint State Graph (JSG) to implicitly incorporate the support actions, those methods do not scale well to large graphs and teams. To address this curse of dimensionality, we propose to use RL to enable agents to learn such graph traversal and teammate supporting behaviors in a data-driven manner. Specifically, through a new formulation of the team coordination on graphs with risky edges problem into Markov Decision Processes (MDPs) with a novel state and action space, we investigate how RL can solve it in two paradigms: First, we use RL for a team of agents to learn how to coordinate and reach the goal with minimal cost on a single EG. We show that RL efficiently solves problems with up to 20/4 or 25/3 nodes/agents, using a fraction of the time needed for JSG to solve such complex problems; Second, we learn a general RL policy for any NN-node EGs to produce efficient supporting behaviors. We present extensive experiments and compare our RL approaches against their classical counterparts

    Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording

    Full text link
    Relating speech to EEG holds considerable importance but is challenging. In this study, a deep convolutional network was employed to extract spatiotemporal features from EEG data. Self-supervised speech representation and contextual text embedding were used as speech features. Contrastive learning was used to relate EEG features to speech features. The experimental results demonstrate the benefits of using self-supervised speech representation and contextual text embedding. Through feature fusion and model ensemble, an accuracy of 60.29% was achieved, and the performance was ranked as No.2 in Task 1 of the Auditory EEG Challenge (ICASSP 2024). The code to implement our work is available on Github: https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch.Comment: 2 pages, 2 figures, accepted by ICASSP 202

    Bulk Density Adjustment of Resin-Based Equivalent Material for Geomechanical Model Test

    Get PDF
    An equivalent material is of significance to the simulation of prototype rock in geomechanical model test. Researchers attempt to ensure that the bulk density of equivalent material is equal to that of prototype rock. In this work, barite sand was used to increase the bulk density of a resin-based equivalent material. The variation law of the bulk density was revealed in the simulation of a prototype rock of a different bulk density. Over 300 specimens were made for uniaxial compression test. Test results indicated that the substitution of quartz sand by barite sand had no apparent influence on the uniaxial compressive strength and elastic modulus of the specimens but can increase the bulk density, according to the proportional coarse aggregate content. An ideal linearity was found in the relationship between the barite sand substitution ratio and the bulk density. The relationship between the bulk density and the usage of coarse aggregate and barite sand was also presented. The test results provided an insight into the bulk density adjustment of resin-based equivalent materials

    OnionNet-2: A Convolutional Neural Network Model for Predicting Protein-Ligand Binding Affinity based on Residue-Atom Contacting Shells

    Full text link
    One key task in virtual screening is to accurately predict the binding affinity (â–³\triangleGG) of protein-ligand complexes. Recently, deep learning (DL) has significantly increased the predicting accuracy of scoring functions due to the extraordinary ability of DL to extract useful features from raw data. Nevertheless, more efforts still need to be paid in many aspects, for the aim of increasing prediction accuracy and decreasing computational cost. In this study, we proposed a simple scoring function (called OnionNet-2) based on convolutional neural network to predict â–³\triangleGG. The protein-ligand interactions are characterized by the number of contacts between protein residues and ligand atoms in multiple distance shells. Compared to published models, the efficacy of OnionNet-2 is demonstrated to be the best for two widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model was further verified by non-experimental decoy structures from docking program and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which showed great success. Thus, our study provides a simple but efficient scoring function for predicting protein-ligand binding free energy.Comment: 7 pages, 4 figures, 1 tabl

    ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

    Full text link
    Graphical User Interface (GUI) automation holds significant promise for assisting users with complex tasks, thereby boosting human productivity. Existing works leveraging Large Language Model (LLM) or LLM-based AI agents have shown capabilities in automating tasks on Android and Web platforms. However, these tasks are primarily aimed at simple device usage and entertainment operations. This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We carefully collected a set of 100 tasks from nine widely-used software applications, such as, After Effects and MS Word, each accompanied by the necessary project files for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied Agent framework, which incorporates a sophisticated GUI parser driven by an LLM-agent and an enhanced reasoning mechanism adept at handling lengthy procedural tasks. Our experimental results reveal that our GUI Parser and Reasoning mechanism outshine existing methods in performance. Nevertheless, the potential remains substantial, with the best model attaining only a 46% success rate on our benchmark. We conclude with a thorough analysis of the current methods' limitations, setting the stage for future breakthroughs in this domain.Comment: Project Page: https://showlab.github.io/assistgui

    Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task

    Full text link
    With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction.Comment: Accepted at NLPCC 2023 (Oral Presentation

    Research on the Application of Cross-Specialty Education and Situational Simulation Teaching in Operation Nursing Practice Teaching

    Get PDF
    Objective To examine the practical effect of inter-professional education and situational simulation teaching implemented in surgical nursing practice teaching. Methods On the whole, 100 undergraduate nursing students in the operating room of the hospital of the authors from May 2019 to August 2020 were selected. These students fell to two groups with the random number table method. The control received the regular teaching, and the research group were given the interprofessional education and context. The Simulation teaching was conducted to compare the theoretical knowledge, skill level, various abilities of the two groups of students, as well as the satisfaction of the operating room doctors to the nursing cooperation of the interns. Results The research group achieved higher theoretical knowledge and a higher skill level than the control (p < 0.05); the various abilities of the research group were higher than those of the control (p < 0.05); the operating room doctors of the research group were more satisfied with the nursing cooperation of interns, as compared with those of the control (p < 0.05). Conclusion In the surgical nursing practice teaching, the inter-professional education and the situational simulation teaching have significant effects and are worth clinical applications
    • …
    corecore