24 research outputs found
Extended Parallel Corpus for Amharic-English Machine Translation
This paper describes the acquisition, preprocessing, segmentation, and
alignment of an Amharic-English parallel corpus. It will be useful for machine
translation of an under-resourced language, Amharic. The corpus is larger than
previously compiled corpora; it is released for research purposes. We trained
neural machine translation and phrase-based statistical machine translation
models using the corpus. In the automatic evaluation, neural machine
translation models outperform phrase-based statistical machine translation
models.Comment: Accepted to 2nd AfricanNLP workshop at EACL 202
Semisupervised Autoencoder for Sentiment Analysis
In this paper, we investigate the usage of autoencoders in modeling textual
data. Traditional autoencoders suffer from at least two aspects: scalability
with the high dimensionality of vocabulary size and dealing with
task-irrelevant words. We address this problem by introducing supervision via
the loss function of autoencoders. In particular, we first train a linear
classifier on the labeled data, then define a loss for the autoencoder with the
weights learned from the linear classifier. To reduce the bias brought by one
single classifier, we define a posterior probability distribution on the
weights of the classifier, and derive the marginalized loss of the autoencoder
with Laplace approximation. We show that our choice of loss function can be
rationalized from the perspective of Bregman Divergence, which justifies the
soundness of our model. We evaluate the effectiveness of our model on six
sentiment analysis datasets, and show that our model significantly outperforms
all the competing methods with respect to classification accuracy. We also show
that our model is able to take advantage of unlabeled dataset and get improved
performance. We further show that our model successfully learns highly
discriminative feature maps, which explains its superior performance.Comment: To appear in AAAI 201
Combining NLP Approaches for Rule Extraction from Legal Documents
International audienceLegal texts express conditions in natural language describing what is permitted, forbidden or mandatory in the context they regulate. Despite the numerous approaches tackling the problem of moving from a natural language legal text to the respective set of machine-readable conditions, results are still unsatisfiable and it remains a major open challenge. In this paper, we propose a preliminary approach which combines different Natural Language Processing techniques towards the extraction of rules from legal documents. More precisely, we combine the linguistic information provided by WordNet together with a syntax-based extraction of rules from legal texts, and a logic-based extraction of dependencies between chunks of such texts. Such a combined approach leads to a powerful solution towards the extraction of machine-readable rules from legal documents. We evaluate the proposed approach over the Australian " Telecommunications consumer protections code "
Multitask Learning with Deep Neural Networks for Community Question Answering
In this paper, we developed a deep neural network (DNN) that learns to solve simultaneously the three tasks of the cQA challenge proposed by the SemEval-2016 Task 3, i.e., question-comment similarity, question-question similarity and new question-comment similarity. The latter is the main task, which can exploit the previous two for achieving better results. Our DNN is trained jointly on all the three cQA tasks and learns to encode questions and comments into a single vector representation shared across the multiple tasks. The results on the official challenge test set show that our approach produces higher accuracy and faster convergence rates than the individual neural networks. Additionally, our method, which does not use any manual feature engineering, approaches the state of the art established with methods that make heavy use of it