554 research outputs found
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate
reasoning paths, thus promoting reasoning accuracy and attracting increasing
attention. Specifically, zero-shot CoT achieves remarkable improvements in a
wide range of reasoning tasks by simply instructing the LLM with the prompt
"Let's think step by step!". Despite the success of zero-shot CoT, the existing
zero-shot prompting techniques remain limited to a single language, making it
challenging to generalize to other languages and hindering global development.
In this work, we introduce cross-lingual prompting (CLP), aiming to improve
zero-shot CoT reasoning across languages. Specifically, CLP consists of two
main components: (1) cross-lingual alignment prompting and (2) task-specific
solver prompting. The cross-lingual alignment prompting is responsible for
aligning representations across different languages, whereas the task-specific
solver prompting is used to generate the final chain of thoughts and results
for the reasoning task. In addition, we further introduce cross-lingual
self-consistent prompting (CLSP) to ensemble different reasoning paths across
languages. Our experimental evaluations on several benchmarks demonstrate that
CLP and CLSP significantly outperform the existing prompting methods and
achieve state-of-the-art performance. We hope this work will inspire further
breakthroughs in cross-lingual CoT.Comment: Accepted at EMNLP2023 Main Conferenc
Improving Few-shot and Zero-shot Entity Linking with Coarse-to-Fine Lexicon-based Retriever
Few-shot and zero-shot entity linking focus on the tail and emerging
entities, which are more challenging but closer to real-world scenarios. The
mainstream method is the ''retrieve and rerank'' two-stage framework. In this
paper, we propose a coarse-to-fine lexicon-based retriever to retrieve entity
candidates in an effective manner, which operates in two layers. The first
layer retrieves coarse-grained candidates by leveraging entity names, while the
second layer narrows down the search to fine-grained candidates within the
coarse-grained ones. In addition, this second layer utilizes entity
descriptions to effectively disambiguate tail or new entities that share names
with existing popular entities. Experimental results indicate that our approach
can obtain superior performance without requiring extensive finetuning in the
retrieval stage. Notably, our approach ranks the 1st in NLPCC 2023 Shared Task
6 on Chinese Few-shot and Zero-shot Entity Linking.Comment: Accepted to NLPCC202
E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network
Continual Learning methods are designed to learn new tasks without erasing
previous knowledge. However, Continual Learning often requires massive
computational power and storage capacity for satisfactory performance. In this
paper, we propose a resource-efficient continual learning method called the
Elastic Expansion Network (E2Net). Leveraging core subnet distillation and
precise replay sample selection, E2Net achieves superior average accuracy and
diminished forgetting within the same computational and storage constraints,
all while minimizing processing time. In E2Net, we propose Representative
Network Distillation to identify the representative core subnet by assessing
parameter quantity and output similarity with the working network, distilling
analogous subnets within the working network to mitigate reliance on rehearsal
buffers and facilitating knowledge transfer across previous tasks. To enhance
storage resource utilization, we then propose Subnet Constraint Experience
Replay to optimize rehearsal efficiency through a sample storage strategy based
on the structures of representative networks. Extensive experiments conducted
predominantly on cloud environments with diverse datasets and also spanning the
edge environment demonstrate that E2Net consistently outperforms
state-of-the-art methods. In addition, our method outperforms competitors in
terms of both storage and computational requirements
SDIF-DA: A Shallow-to-Deep Interaction Framework with Data Augmentation for Multi-modal Intent Detection
Multi-modal intent detection aims to utilize various modalities to understand
the user's intentions, which is essential for the deployment of dialogue
systems in real-world scenarios. The two core challenges for multi-modal intent
detection are (1) how to effectively align and fuse different features of
modalities and (2) the limited labeled multi-modal intent training data. In
this work, we introduce a shallow-to-deep interaction framework with data
augmentation (SDIF-DA) to address the above challenges. Firstly, SDIF-DA
leverages a shallow-to-deep interaction module to progressively and effectively
align and fuse features across text, video, and audio modalities. Secondly, we
propose a ChatGPT-based data augmentation approach to automatically augment
sufficient training data. Experimental results demonstrate that SDIF-DA can
effectively align and fuse multi-modal features by achieving state-of-the-art
performance. In addition, extensive analyses show that the introduced data
augmentation approach can successfully distill knowledge from the large
language model.Comment: Accepted by ICASSP 202
A Survey on Causal Reinforcement Learning
While Reinforcement Learning (RL) achieves tremendous success in sequential
decision-making problems of many domains, it still faces key challenges of data
inefficiency and the lack of interpretability. Interestingly, many researchers
have leveraged insights from the causality literature recently, bringing forth
flourishing works to unify the merits of causality and address well the
challenges from RL. As such, it is of great necessity and significance to
collate these Causal Reinforcement Learning (CRL) works, offer a review of CRL
methods, and investigate the potential functionality from causality toward RL.
In particular, we divide existing CRL approaches into two categories according
to whether their causality-based information is given in advance or not. We
further analyze each category in terms of the formalization of different
models, ranging from the Markov Decision Process (MDP), Partially Observed
Markov Decision Process (POMDP), Multi-Arm Bandits (MAB), and Dynamic Treatment
Regime (DTR). Moreover, we summarize the evaluation matrices and open sources
while we discuss emerging applications, along with promising prospects for the
future development of CRL.Comment: 29 pages, 20 figure
Multivariate regression models in estimating the behavior of FRP tube encased recycled aggregate concrete
This study applied newly developed multivariate statistical models to estimating the mechanical properties of recycled aggregate concrete cylinder encased by fiber reinforced polymer (FRP). Two different types of RFPs were applied, namely flax FRP and polyester FRP. Ten independent variables were predefined including the FRP type and cylinder size. It was found that several mixed models outperformed the traditional linear regression approach, based on the accuracy and residual value distribution. Individual factor analysis indicated that the fiber thickness and layer number had more significant impacts on the strength and strain of FRP-encased concrete’s transitional point, compared to their impacts at the ultimate state
Mechanical Properties of Recycled Aggregate Concrete Modified by Nano-particles
In this study, different nano-particles were used to modify recycled aggregates concrete (RAC) containing recycled clay brick aggregates (RCBAs) to improve the RAC properties. Two stages of experimental works were performed. In the first stage, various nano-particle mixtures produced by different mixing methods, i.e. the use of surfactant and ultrasonication, were examined by optical microscope to evaluate the dispersion of the nano-particles in water liquid. The nano-particles modified cement mortar specimens were further evaluated by flexural tensile test to check how these mixing methods affect the properties of the nano-particle modified cement mortar. In the second experimental stage, the effects of four replacement ratios of recycled aggregates, three type of nano-particles, two mixing methods of RAC, additional surfactant and ultrasonication process used in the mix of nano-particle liquid, and the dosages of the nano-particles on the workability, compressive and split tensile properties of the nano-particle modified RAC were investigated
Adaptive Policy with Wait- Model for Simultaneous Translation
Simultaneous machine translation (SiMT) requires a robust read/write policy
in conjunction with a high-quality translation model. Traditional methods rely
on either a fixed wait- policy coupled with a standalone wait-
translation model, or an adaptive policy jointly trained with the translation
model. In this study, we propose a more flexible approach by decoupling the
adaptive policy model from the translation model. Our motivation stems from the
observation that a standalone multi-path wait- model performs competitively
with adaptive policies utilized in state-of-the-art SiMT approaches.
Specifically, we introduce DaP, a divergence-based adaptive policy, that makes
read/write decisions for any translation model based on the potential
divergence in translation distributions resulting from future information. DaP
extends a frozen wait- model with lightweight parameters, and is both memory
and computation efficient. Experimental results across various benchmarks
demonstrate that our approach offers an improved trade-off between translation
accuracy and latency, outperforming strong baselines.Comment: Accept to EMNLP 2023 main conference. 17 pages, 12 figures, 5 table
Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently
Existing hand detection methods usually follow the pipeline of multiple
stages with high computation cost, i.e., feature extraction, region proposal,
bounding box regression, and additional layers for rotated region detection. In
this paper, we propose a new Scale Invariant Fully Convolutional Network
(SIFCN) trained in an end-to-end fashion to detect hands efficiently.
Specifically, we merge the feature maps from high to low layers in an iterative
way, which handles different scales of hands better with less time overhead
comparing to concatenating them simply. Moreover, we develop the Complementary
Weighted Fusion (CWF) block to make full use of the distinctive features among
multiple layers to achieve scale invariance. To deal with rotated hand
detection, we present the rotation map to get rid of complex rotation and
derotation layers. Besides, we design the multi-scale loss scheme to accelerate
the training process significantly by adding supervision to the intermediate
layers of the network. Compared with the state-of-the-art methods, our
algorithm shows comparable accuracy and runs a 4.23 times faster speed on the
VIVA dataset and achieves better average precision on Oxford hand detection
dataset at a speed of 62.5 fps.Comment: Accepted to AAAI201
- …