5,861 research outputs found
Leveraging Graph-based Cross-modal Information Fusion for Neural Sign Language Translation
Sign Language (SL), as the mother tongue of the deaf community, is a special
visual language that most hearing people cannot understand. In recent years,
neural Sign Language Translation (SLT), as a possible way for bridging
communication gap between the deaf and the hearing people, has attracted
widespread academic attention. We found that the current mainstream end-to-end
neural SLT models, which tries to learning language knowledge in a weakly
supervised manner, could not mine enough semantic information under the
condition of low data resources. Therefore, we propose to introduce additional
word-level semantic knowledge of sign language linguistics to assist in
improving current end-to-end neural SLT models. Concretely, we propose a novel
neural SLT model with multi-modal feature fusion based on the dynamic graph, in
which the cross-modal information, i.e. text and video, is first assembled as a
dynamic graph according to their correlation, and then the graph is processed
by a multi-modal graph encoder to generate the multi-modal embeddings for
further usage in the subsequent neural translation models. To the best of our
knowledge, we are the first to introduce graph neural networks, for fusing
multi-modal information, into neural sign language translation models.
Moreover, we conducted experiments on a publicly available popular SLT dataset
RWTH-PHOENIX-Weather-2014T. and the quantitative experiments show that our
method can improve the model
FluentSigners-50: A signer independent benchmark dataset for sign language processing
This paper presents a new large-scale signer independent dataset for Kazakh-Russian Sign Language (KRSL) for the purposes of Sign Language Processing. We envision it to serve as a new benchmark dataset for performance evaluations of Continuous Sign Language Recognition (CSLR) and Translation (CSLT) tasks. The proposed FluentSigners-50 dataset consists of 173 sentences performed by 50 KRSL signers resulting in 43,250 video samples. Dataset contributors recorded videos in real-life settings on a wide variety of backgrounds using various devices such as smartphones and web cameras. Therefore, distance to the camera, camera angles and aspect ratio, video quality, and frame rates varied for each dataset contributor. Additionally, the proposed dataset contains a high degree of linguistic and inter-signer variability and thus is a better training set for recognizing a real-life sign language. FluentSigners-50 baseline is established using two state-of-the-art methods, Stochastic CSLR and TSPNet. To this end, we carefully prepared three benchmark train-test splits for models’ evaluations in terms of: signer independence, age independence, and unseen sentences. FluentSigners-50 is publicly available at https://krslproject.github.io/FluentSigners-50/publishedVersio
Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction
Most existing event extraction (EE) methods merely extract event arguments
within the sentence scope. However, such sentence-level EE methods struggle to
handle soaring amounts of documents from emerging applications, such as
finance, legislation, health, etc., where event arguments always scatter across
different sentences, and even multiple such event mentions frequently co-exist
in the same document. To address these challenges, we propose a novel
end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic
graph to fulfill the document-level EE (DEE) effectively. Moreover, we
reformalize a DEE task with the no-trigger-words design to ease the
document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we
build a large-scale real-world dataset consisting of Chinese financial
announcements with the challenges mentioned above. Extensive experiments with
comprehensive analyses illustrate the superiority of Doc2EDAG over
state-of-the-art methods. Data and codes can be found at
https://github.com/dolphin-zs/Doc2EDAG.Comment: Accepted by EMNLP 201
Improving Continuous Sign Language Recognition with Consistency Constraints and Signer Removal
Most deep-learning-based continuous sign language recognition (CSLR) models
share a similar backbone consisting of a visual module, a sequential module,
and an alignment module. However, due to limited training samples, a
connectionist temporal classification loss may not train such CSLR backbones
sufficiently. In this work, we propose three auxiliary tasks to enhance the
CSLR backbones. The first task enhances the visual module, which is sensitive
to the insufficient training problem, from the perspective of consistency.
Specifically, since the information of sign languages is mainly included in
signers' facial expressions and hand movements, a keypoint-guided spatial
attention module is developed to enforce the visual module to focus on
informative regions, i.e., spatial attention consistency. Second, noticing that
both the output features of the visual and sequential modules represent the
same sentence, to better exploit the backbone's power, a sentence embedding
consistency constraint is imposed between the visual and sequential modules to
enhance the representation power of both features. We name the CSLR model
trained with the above auxiliary tasks as consistency-enhanced CSLR, which
performs well on signer-dependent datasets in which all signers appear during
both training and testing. To make it more robust for the signer-independent
setting, a signer removal module based on feature disentanglement is further
proposed to remove signer information from the backbone. Extensive ablation
studies are conducted to validate the effectiveness of these auxiliary tasks.
More remarkably, with a transformer-based backbone, our model achieves
state-of-the-art or competitive performance on five benchmarks, PHOENIX-2014,
PHOENIX-2014-T, PHOENIX-2014-SI, CSL, and CSL-Daily
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
This work dedicates to continuous sign language recognition (CSLR), which is
a weakly supervised task dealing with the recognition of continuous signs from
videos, without any prior knowledge about the temporal boundaries between
consecutive signs. Data scarcity heavily impedes the progress of CSLR. Existing
approaches typically train CSLR models on a monolingual corpus, which is orders
of magnitude smaller than that of speech recognition. In this work, we explore
the feasibility of utilizing multilingual sign language corpora to facilitate
monolingual CSLR. Our work is built upon the observation of cross-lingual
signs, which originate from different sign languages but have similar visual
signals (e.g., hand shape and motion). The underlying idea of our approach is
to identify the cross-lingual signs in one sign language and properly leverage
them as auxiliary training data to improve the recognition capability of
another. To achieve the goal, we first build two sign language dictionaries
containing isolated signs that appear in two datasets. Then we identify the
sign-to-sign mappings between two sign languages via a well-optimized isolated
sign language recognition model. At last, we train a CSLR model on the
combination of the target data with original labels and the auxiliary data with
mapped labels. Experimentally, our approach achieves state-of-the-art
performance on two widely-used CSLR datasets: Phoenix-2014 and Phoenix-2014T.Comment: Accepted by ICCV 202
- …