11,129 research outputs found
An Open Source Testing Tool for Evaluating Handwriting Input Methods
This paper presents an open source tool for testing the recognition accuracy
of Chinese handwriting input methods. The tool consists of two modules, namely
the PC and Android mobile client. The PC client reads handwritten samples in
the computer, and transfers them individually to the Android client in
accordance with the socket communication protocol. After the Android client
receives the data, it simulates the handwriting on screen of client device, and
triggers the corresponding handwriting recognition method. The recognition
accuracy is recorded by the Android client. We present the design principles
and describe the implementation of the test platform. We construct several test
datasets for evaluating different handwriting recognition systems, and conduct
an objective and comprehensive test using six Chinese handwriting input methods
with five datasets. The test results for the recognition accuracy are then
compared and analyzed.Comment: 5 pages, 3 figures, 11 tables. Accepted to appear at ICDAR 201
Empowering CAM-Based Methods with Capability to Generate Fine-Grained and High-Faithfulness Explanations
Recently, the explanation of neural network models has garnered considerable
research attention. In computer vision, CAM (Class Activation Map)-based
methods and LRP (Layer-wise Relevance Propagation) method are two common
explanation methods. However, since most CAM-based methods can only generate
global weights, they can only generate coarse-grained explanations at a deep
layer. LRP and its variants, on the other hand, can generate fine-grained
explanations. But the faithfulness of the explanations is too low. To address
these challenges, in this paper, we propose FG-CAM (Fine-Grained CAM), which
extends CAM-based methods to enable generating fine-grained and
high-faithfulness explanations. FG-CAM uses the relationship between two
adjacent layers of feature maps with resolution differences to gradually
increase the explanation resolution, while finding the contributing pixels and
filtering out the pixels that do not contribute. Our method not only solves the
shortcoming of CAM-based methods without changing their characteristics, but
also generates fine-grained explanations that have higher faithfulness than LRP
and its variants. We also present FG-CAM with denoising, which is a variant of
FG-CAM and is able to generate less noisy explanations with almost no change in
explanation faithfulness. Experimental results show that the performance of
FG-CAM is almost unaffected by the explanation resolution. FG-CAM outperforms
existing CAM-based methods significantly in both shallow and intermediate
layers, and outperforms LRP and its variants significantly in the input layer.
Our code is available at https://github.com/dongmo-qcq/FG-CAM.Comment: This paper has been accepted by AAAI202
Sub-optimal Policy Aided Multi-Agent Reinforcement Learning for Flocking Control
Flocking control is a challenging problem, where multiple agents, such as
drones or vehicles, need to reach a target position while maintaining the flock
and avoiding collisions with obstacles and collisions among agents in the
environment. Multi-agent reinforcement learning has achieved promising
performance in flocking control. However, methods based on traditional
reinforcement learning require a considerable number of interactions between
agents and the environment. This paper proposes a sub-optimal policy aided
multi-agent reinforcement learning algorithm (SPA-MARL) to boost sample
efficiency. SPA-MARL directly leverages a prior policy that can be manually
designed or solved with a non-learning method to aid agents in learning, where
the performance of the policy can be sub-optimal. SPA-MARL recognizes the
difference in performance between the sub-optimal policy and itself, and then
imitates the sub-optimal policy if the sub-optimal policy is better. We
leverage SPA-MARL to solve the flocking control problem. A traditional control
method based on artificial potential fields is used to generate a sub-optimal
policy. Experiments demonstrate that SPA-MARL can speed up the training process
and outperform both the MARL baseline and the used sub-optimal policy.Comment: Accepted by IEEE International Conference on Systems, Man, and
Cybernetics (SMC) 202
Neuromorphic Online Learning for Spatiotemporal Patterns with a Forward-only Timeline
Spiking neural networks (SNNs) are bio-plausible computing models with high
energy efficiency. The temporal dynamics of neurons and synapses enable them to
detect temporal patterns and generate sequences. While Backpropagation Through
Time (BPTT) is traditionally used to train SNNs, it is not suitable for online
learning of embedded applications due to its high computation and memory cost
as well as extended latency. Previous works have proposed online learning
algorithms, but they often utilize highly simplified spiking neuron models
without synaptic dynamics and reset feedback, resulting in subpar performance.
In this work, we present Spatiotemporal Online Learning for Synaptic Adaptation
(SOLSA), specifically designed for online learning of SNNs composed of Leaky
Integrate and Fire (LIF) neurons with exponentially decayed synapses and soft
reset. The algorithm not only learns the synaptic weight but also adapts the
temporal filters associated to the synapses. Compared to the BPTT algorithm,
SOLSA has much lower memory requirement and achieves a more balanced temporal
workload distribution. Moreover, SOLSA incorporates enhancement techniques such
as scheduled weight update, early stop training and adaptive synapse filter,
which speed up the convergence and enhance the learning performance. When
compared to other non-BPTT based SNN learning, SOLSA demonstrates an average
learning accuracy improvement of 14.2%. Furthermore, compared to BPTT, SOLSA
achieves a 5% higher average learning accuracy with a 72% reduction in memory
cost.Comment: 9 pages,8 figure
Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering
Compositional spatio-temporal reasoning poses a significant challenge in the
field of video question answering (VideoQA). Existing approaches struggle to
establish effective symbolic reasoning structures, which are crucial for
answering compositional spatio-temporal questions. To address this challenge,
we propose a neural-symbolic framework called Neural-Symbolic VideoQA
(NS-VideoQA), specifically designed for real-world VideoQA tasks. The
uniqueness and superiority of NS-VideoQA are two-fold: 1) It proposes a Scene
Parser Network (SPN) to transform static-dynamic video scenes into Symbolic
Representation (SR), structuralizing persons, objects, relations, and action
chronologies. 2) A Symbolic Reasoning Machine (SRM) is designed for top-down
question decompositions and bottom-up compositional reasonings. Specifically, a
polymorphic program executor is constructed for internally consistent reasoning
from SR to the final answer. As a result, Our NS-VideoQA not only improves the
compositional spatio-temporal reasoning in real-world VideoQA task, but also
enables step-by-step error analysis by tracing the intermediate results.
Experimental evaluations on the AGQA Decomp benchmark demonstrate the
effectiveness of the proposed NS-VideoQA framework. Empirical studies further
confirm that NS-VideoQA exhibits internal consistency in answering
compositional questions and significantly improves the capability of
spatio-temporal and logical inference for VideoQA tasks
- …