134 research outputs found
Show, Recall, and Tell: Image Captioning with Recall Mechanism
Generating natural and accurate descriptions in image cap-tioning has always
been a challenge. In this paper, we pro-pose a novel recall mechanism to
imitate the way human con-duct captioning. There are three parts in our recall
mecha-nism : recall unit, semantic guide (SG) and recalled-wordslot (RWS).
Recall unit is a text-retrieval module designedto retrieve recalled words for
images. SG and RWS are de-signed for the best use of recalled words. SG branch
cangenerate a recalled context, which can guide the process ofgenerating
caption. RWS branch is responsible for copyingrecalled words to the caption.
Inspired by pointing mecha-nism in text summarization, we adopt a soft switch
to balancethe generated-word probabilities between SG and RWS. Inthe CIDEr
optimization step, we also introduce an individualrecalled-word reward (WR) to
boost training. Our proposedmethods (SG+RWS+WR) achieve BLEU-4 / CIDEr /
SPICEscores of 36.6 / 116.9 / 21.3 with cross-entropy loss and 38.7 /129.1 /
22.4 with CIDEr optimization on MSCOCO Karpathytest split, which surpass the
results of other state-of-the-artmethods.Comment: Published in AAAI 202
Scaling Team Coordination on Graphs with Reinforcement Learning
This paper studies Reinforcement Learning (RL) techniques to enable team
coordination behaviors in graph environments with support actions among
teammates to reduce the costs of traversing certain risky edges in a
centralized manner. While classical approaches can solve this non-standard
multi-agent path planning problem by converting the original Environment Graph
(EG) into a Joint State Graph (JSG) to implicitly incorporate the support
actions, those methods do not scale well to large graphs and teams. To address
this curse of dimensionality, we propose to use RL to enable agents to learn
such graph traversal and teammate supporting behaviors in a data-driven manner.
Specifically, through a new formulation of the team coordination on graphs with
risky edges problem into Markov Decision Processes (MDPs) with a novel state
and action space, we investigate how RL can solve it in two paradigms: First,
we use RL for a team of agents to learn how to coordinate and reach the goal
with minimal cost on a single EG. We show that RL efficiently solves problems
with up to 20/4 or 25/3 nodes/agents, using a fraction of the time needed for
JSG to solve such complex problems; Second, we learn a general RL policy for
any -node EGs to produce efficient supporting behaviors. We present
extensive experiments and compare our RL approaches against their classical
counterparts
Self-supervised speech representation and contextual text embedding for match-mismatch classification with EEG recording
Relating speech to EEG holds considerable importance but is challenging. In
this study, a deep convolutional network was employed to extract spatiotemporal
features from EEG data. Self-supervised speech representation and contextual
text embedding were used as speech features. Contrastive learning was used to
relate EEG features to speech features. The experimental results demonstrate
the benefits of using self-supervised speech representation and contextual text
embedding. Through feature fusion and model ensemble, an accuracy of 60.29% was
achieved, and the performance was ranked as No.2 in Task 1 of the Auditory EEG
Challenge (ICASSP 2024). The code to implement our work is available on Github:
https://github.com/bobwangPKU/EEG-Stimulus-Match-Mismatch.Comment: 2 pages, 2 figures, accepted by ICASSP 202
Bulk Density Adjustment of Resin-Based Equivalent Material for Geomechanical Model Test
An equivalent material is of significance to the simulation of prototype rock in geomechanical model test. Researchers attempt to ensure that the bulk density of equivalent material is equal to that of prototype rock. In this work, barite sand was used to increase the bulk density of a resin-based equivalent material. The variation law of the bulk density was revealed in the simulation of a prototype rock of a different bulk density. Over 300 specimens were made for uniaxial compression test. Test results indicated that the substitution of quartz sand by barite sand had no apparent influence on the uniaxial compressive strength and elastic modulus of the specimens but can increase the bulk density, according to the proportional coarse aggregate content. An ideal linearity was found in the relationship between the barite sand substitution ratio and the bulk density. The relationship between the bulk density and the usage of coarse aggregate and barite sand was also presented. The test results provided an insight into the bulk density adjustment of resin-based equivalent materials
OnionNet-2: A Convolutional Neural Network Model for Predicting Protein-Ligand Binding Affinity based on Residue-Atom Contacting Shells
One key task in virtual screening is to accurately predict the binding
affinity () of protein-ligand complexes. Recently, deep learning
(DL) has significantly increased the predicting accuracy of scoring functions
due to the extraordinary ability of DL to extract useful features from raw
data. Nevertheless, more efforts still need to be paid in many aspects, for the
aim of increasing prediction accuracy and decreasing computational cost. In
this study, we proposed a simple scoring function (called OnionNet-2) based on
convolutional neural network to predict . The protein-ligand
interactions are characterized by the number of contacts between protein
residues and ligand atoms in multiple distance shells. Compared to published
models, the efficacy of OnionNet-2 is demonstrated to be the best for two
widely used datasets CASF-2016 and CASF-2013 benchmarks. The OnionNet-2 model
was further verified by non-experimental decoy structures from docking program
and the CSAR NRC-HiQ data set (a high-quality data set provided by CSAR), which
showed great success. Thus, our study provides a simple but efficient scoring
function for predicting protein-ligand binding free energy.Comment: 7 pages, 4 figures, 1 tabl
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
Graphical User Interface (GUI) automation holds significant promise for
assisting users with complex tasks, thereby boosting human productivity.
Existing works leveraging Large Language Model (LLM) or LLM-based AI agents
have shown capabilities in automating tasks on Android and Web platforms.
However, these tasks are primarily aimed at simple device usage and
entertainment operations. This paper presents a novel benchmark, AssistGUI, to
evaluate whether models are capable of manipulating the mouse and keyboard on
the Windows platform in response to user-requested tasks. We carefully
collected a set of 100 tasks from nine widely-used software applications, such
as, After Effects and MS Word, each accompanied by the necessary project files
for better evaluation. Moreover, we propose an advanced Actor-Critic Embodied
Agent framework, which incorporates a sophisticated GUI parser driven by an
LLM-agent and an enhanced reasoning mechanism adept at handling lengthy
procedural tasks. Our experimental results reveal that our GUI Parser and
Reasoning mechanism outshine existing methods in performance. Nevertheless, the
potential remains substantial, with the best model attaining only a 46% success
rate on our benchmark. We conclude with a thorough analysis of the current
methods' limitations, setting the stage for future breakthroughs in this
domain.Comment: Project Page: https://showlab.github.io/assistgui
Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task
With the increasing capabilities of large language models (LLMs), these
high-performance models have achieved state-of-the-art results on a wide range
of natural language processing (NLP) tasks. However, the models' performance on
commonly-used benchmark datasets often fails to accurately reflect their
reliability and robustness when applied to real-world noisy data. To address
these challenges, we propose a unified robustness evaluation framework based on
the slot-filling task to systematically evaluate the dialogue understanding
capability of LLMs in diverse input perturbation scenarios. Specifically, we
construct a input perturbation evaluation dataset, Noise-LLM, which contains
five types of single perturbation and four types of mixed perturbation data.
Furthermore, we utilize a multi-level data augmentation method (character,
word, and sentence levels) to construct a candidate data pool, and carefully
design two ways of automatic task demonstration construction strategies
(instance-level and entity-level) with various prompt templates. Our aim is to
assess how well various robustness methods of LLMs perform in real-world noisy
scenarios. The experiments have demonstrated that the current open-source LLMs
generally achieve limited perturbation robustness performance. Based on these
experimental observations, we make some forward-looking suggestions to fuel the
research in this direction.Comment: Accepted at NLPCC 2023 (Oral Presentation
Research on the Application of Cross-Specialty Education and Situational Simulation Teaching in Operation Nursing Practice Teaching
Objective To examine the practical effect of inter-professional education and situational simulation teaching implemented in surgical nursing practice teaching. Methods On the whole, 100 undergraduate nursing students in the operating room of the hospital of the authors from May 2019 to August 2020 were selected. These students fell to two groups with the random number table method. The control received the regular teaching, and the research group were given the interprofessional education and context. The Simulation teaching was conducted to compare the theoretical knowledge, skill level, various abilities of the two groups of students, as well as the satisfaction of the operating room doctors to the nursing cooperation of the interns. Results The research group achieved higher theoretical knowledge and a higher skill level than the control (p < 0.05); the various abilities of the research group were higher than those of the control (p < 0.05); the operating room doctors of the research group were more satisfied with the nursing cooperation of interns, as compared with those of the control (p < 0.05). Conclusion In the surgical nursing practice teaching, the inter-professional education and the situational simulation teaching have significant effects and are worth clinical applications
- …