963 research outputs found
Text Region Multiple Information Perception Network for Scene Text Detection
Segmentation-based scene text detection algorithms can handle arbitrary shape
scene texts and have strong robustness and adaptability, so it has attracted
wide attention. Existing segmentation-based scene text detection algorithms
usually only segment the pixels in the center region of the text, while
ignoring other information of the text region, such as edge information,
distance information, etc., thus limiting the detection accuracy of the
algorithm for scene text. This paper proposes a plug-and-play module called the
Region Multiple Information Perception Module (RMIPM) to enhance the detection
performance of segmentation-based algorithms. Specifically, we design an
improved module that can perceive various types of information about scene text
regions, such as text foreground classification maps, distance maps, direction
maps, etc. Experiments on MSRA-TD500 and TotalText datasets show that our
method achieves comparable performance with current state-of-the-art
algorithms.Comment: Accepted to ICASSP 202
BPDO:Boundary Points Dynamic Optimization for Arbitrary Shape Scene Text Detection
Arbitrary shape scene text detection is of great importance in scene
understanding tasks. Due to the complexity and diversity of text in natural
scenes, existing scene text algorithms have limited accuracy for detecting
arbitrary shape text. In this paper, we propose a novel arbitrary shape scene
text detector through boundary points dynamic optimization(BPDO). The proposed
model is designed with a text aware module (TAM) and a boundary point dynamic
optimization module (DOM). Specifically, the model designs a text aware module
based on segmentation to obtain boundary points describing the central region
of the text by extracting a priori information about the text region. Then,
based on the idea of deformable attention, it proposes a dynamic optimization
model for boundary points, which gradually optimizes the exact position of the
boundary points based on the information of the adjacent region of each
boundary point. Experiments on CTW-1500, Total-Text, and MSRA-TD500 datasets
show that the model proposed in this paper achieves a performance that is
better than or comparable to the state-of-the-art algorithm, proving the
effectiveness of the model.Comment: Accepted to ICASSP 202
Application of Fireproof Coating for New Energy Vehicle Battery Pack
In the development process of new energy vehicles, the battery pack is one of the key parts, and the safety of the battery pack has always been an important factor affecting the application range and market sales of new energy vehicles. In order to improve the safety of battery packs, fireproof coatings are widely used on the surface of battery packs. This paper introduces the application of fireproof coatings in new energy vehicles by analyzing the composition and function of fireproof coatings
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Chain-of-thought (CoT) is capable of eliciting models to explicitly generate
reasoning paths, thus promoting reasoning accuracy and attracting increasing
attention. Specifically, zero-shot CoT achieves remarkable improvements in a
wide range of reasoning tasks by simply instructing the LLM with the prompt
"Let's think step by step!". Despite the success of zero-shot CoT, the existing
zero-shot prompting techniques remain limited to a single language, making it
challenging to generalize to other languages and hindering global development.
In this work, we introduce cross-lingual prompting (CLP), aiming to improve
zero-shot CoT reasoning across languages. Specifically, CLP consists of two
main components: (1) cross-lingual alignment prompting and (2) task-specific
solver prompting. The cross-lingual alignment prompting is responsible for
aligning representations across different languages, whereas the task-specific
solver prompting is used to generate the final chain of thoughts and results
for the reasoning task. In addition, we further introduce cross-lingual
self-consistent prompting (CLSP) to ensemble different reasoning paths across
languages. Our experimental evaluations on several benchmarks demonstrate that
CLP and CLSP significantly outperform the existing prompting methods and
achieve state-of-the-art performance. We hope this work will inspire further
breakthroughs in cross-lingual CoT.Comment: Accepted at EMNLP2023 Main Conferenc
A Preliminary Evaluation of ChatGPT for Zero-shot Dialogue Understanding
Zero-shot dialogue understanding aims to enable dialogue to track the user's
needs without any training data, which has gained increasing attention. In this
work, we investigate the understanding ability of ChatGPT for zero-shot
dialogue understanding tasks including spoken language understanding (SLU) and
dialogue state tracking (DST). Experimental results on four popular benchmarks
reveal the great potential of ChatGPT for zero-shot dialogue understanding. In
addition, extensive analysis shows that ChatGPT benefits from the multi-turn
interactive prompt in the DST task but struggles to perform slot filling for
SLU. Finally, we summarize several unexpected behaviors of ChatGPT in dialogue
understanding tasks, hoping to provide some insights for future research on
building zero-shot dialogue understanding systems with Large Language Models
(LLMs).Comment: Technical Repor
CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition
Scene text recognition, as a cross-modal task involving vision and text, is
an important research topic in computer vision. Most existing methods use
language models to extract semantic information for optimizing visual
recognition. However, the guidance of visual cues is ignored in the process of
semantic mining, which limits the performance of the algorithm in recognizing
irregular scene text. To tackle this issue, we propose a novel cross-modal
fusion network (CMFN) for irregular scene text recognition, which incorporates
visual cues into the semantic mining process. Specifically, CMFN consists of a
position self-enhanced encoder, a visual recognition branch and an iterative
semantic recognition branch. The position self-enhanced encoder provides
character sequence position encoding for both the visual recognition branch and
the iterative semantic recognition branch. The visual recognition branch
carries out visual recognition based on the visual features extracted by CNN
and the position encoding information provided by the position self-enhanced
encoder. The iterative semantic recognition branch, which consists of a
language recognition module and a cross-modal fusion gate, simulates the way
that human recognizes scene text and integrates cross-modal visual cues for
text recognition. The experiments demonstrate that the proposed CMFN algorithm
achieves comparable performance to state-of-the-art algorithms, indicating its
effectiveness.Comment: Accepted to ICONIP 202
- …