Search CORE

13,535 research outputs found

Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop

Author: Alishahi Afra
Chrupała Grzegorz
Linzen Tal
Publication venue
Publication date: 05/04/2019
Field of study

The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category

arXiv.org e-Print Archive

Tilburg University Repository

Neural Chinese Word Segmentation with Lexicon and Unlabeled Data via Posterior Regularization

Author: Cai Deng
Chen Xinchi
Chen Xinchi
Dauphin Yann
Ganchev Kuzman
Lafferty John D.
Levow Gina-Anne
Liu Junxin
Liu Yijia
Low Jin Kiat
Luo Wencan
Pei Wenzhe
Sun Weiwei
Xu Jingjing
Xue Nianwen
Yang Jie
Zhang Jiacheng
Zhang Meishan
Zhang Qi
Zhang Yanna
Zhao Hai
Zhao Hai
Zhao Lujun
Zheng Xiaoqing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/04/2019
Field of study

Existing methods for CWS usually rely on a large number of labeled sentences to train word segmentation models, which are expensive and time-consuming to annotate. Luckily, the unlabeled data is usually easy to collect and many high-quality Chinese lexicons are off-the-shelf, both of which can provide useful information for CWS. In this paper, we propose a neural approach for Chinese word segmentation which can exploit both lexicon and unlabeled data. Our approach is based on a variant of posterior regularization algorithm, and the unlabeled data and lexicon are incorporated into model training as indirect supervision by regularizing the prediction space of CWS models. Extensive experiments on multiple benchmark datasets in both in-domain and cross-domain scenarios validate the effectiveness of our approach.Comment: 7 pages, 11 figures, accepted by the 2019 World Wide Web Conference (WWW '19

arXiv.org e-Print Archive

Crossref

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography