13,138 research outputs found
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
Advancements in sign language processing have been hindered by a lack of
sufficient data, impeding progress in recognition, translation, and production
tasks. The absence of comprehensive sign language datasets across the world's
sign languages has widened the gap in this field, resulting in a few sign
languages being studied more than others, making this research area extremely
skewed mostly towards sign languages from high-income countries. In this work
we introduce a new large and highly multilingual dataset for sign language
translation: JWSign. The dataset consists of 2,530 hours of Bible translations
in 98 sign languages, featuring more than 1,500 individual signers. On this
dataset, we report neural machine translation experiments. Apart from bilingual
baseline systems, we also train multilingual systems, including some that take
into account the typological relatedness of signed or spoken languages. Our
experiments highlight that multilingual systems are superior to bilingual
baselines, and that in higher-resource scenarios, clustering language pairs
that are related improves translation quality.Comment: EMNLP 20223 (Findings
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
Advancements in sign language processing have been hindered by a lack of sufficient data, impeding progress in recognition, translation, and production tasks. The absence of comprehensive sign language datasets across the world's sign languages has widened the gap in this field, resulting in a few sign languages being studied more than others, making this research area extremely skewed mostly towards sign languages from high-income countries. In this work we introduce a new large and highly multilingual dataset for sign language translation: JWSign. The dataset consists of 2,530 hours of Bible translations in 98 sign languages, featuring more than 1,500 individual signers. On this dataset, we report neural machine translation experiments. Apart from bilingual baseline systems, we also train multilingual systems, including some that take into account the typological relatedness of signed or spoken languages. Our experiments highlight that multilingual systems are superior to bilingual baselines, and that in higher-resource scenarios, clustering language pairs that are related improves translation quality
NeuralREG: An end-to-end approach to referring expression generation
Traditionally, Referring Expression Generation (REG) models first decide on
the form and then on the content of references to discourse entities in text,
typically relying on features such as salience and grammatical function. In
this paper, we present a new approach (NeuralREG), relying on deep neural
networks, which makes decisions about form and content in one go without
explicit feature extraction. Using a delexicalized version of the WebNLG
corpus, we show that the neural model substantially improves over two strong
baselines. Data and models are publicly available.Comment: Accepted for presentation at ACL 201
Machine Reading the Primeros Libros
Early modern printed books pose particular challenges for automatic transcription: uneven inking, irregular orthographies, radically multilingual texts. As a result, modern efforts to transcribe these documents tend to produce the textual gibberish commonly known as "dirty OCR" (Optical Character Recognition). This noisy output is most frequently seen as a barrier to access for scholars interested in the computational analysis or digital display of transcribed documents. This article, however, proposes that a closer analysis of dirty OCR can reveal both historical and cultural factors at play in the practice of automatic transcription. To make this argument, it focuses on tools developed for the automatic transcription of the Primeros Libros collection of sixteenth century Mexican printed books. By bringing together the history of the collection with that of the OCR tool, it illustrates how the colonial history of these documents is embedded in, and transformed by, the statistical models used for automatic transcription. It argues that automatic transcription, itself a mechanical and practical tool, also has an interpretive effect on transcribed texts that can have practical consequences for scholarly work
An AI-Based Framework for Translating American Sign Language to English and Vice Versa
Abstract: In this paper, we propose a framework to convert American Sign Language (ASL) to English and English to ASL. Within this framework, we use a deep learning model along with the rolling average prediction that captures image frames from videos and classifies the signs from the image frames. The classified frames are then used to construct ASL words and sentences to support people with hearing impairments. We also use the same deep learning model to capture signs from the people with deaf symptoms and convert them into ASL words and English sentences. Based on this framework, we developed a web-based tool to use in real-life application and we also present the tool as a proof of concept. With the evaluation, we found that the deep learning model converts the image signs into ASL words and sentences with high accuracy. The tool was also found to be very useful for people with hearing impairment and deaf symptoms. The main contribution of this work is the design of a system to convert ASL to English and vice versa
- …