24 research outputs found

    Extended Parallel Corpus for Amharic-English Machine Translation

    Full text link
    This paper describes the acquisition, preprocessing, segmentation, and alignment of an Amharic-English parallel corpus. It will be useful for machine translation of an under-resourced language, Amharic. The corpus is larger than previously compiled corpora; it is released for research purposes. We trained neural machine translation and phrase-based statistical machine translation models using the corpus. In the automatic evaluation, neural machine translation models outperform phrase-based statistical machine translation models.Comment: Accepted to 2nd AfricanNLP workshop at EACL 202

    Semisupervised Autoencoder for Sentiment Analysis

    Full text link
    In this paper, we investigate the usage of autoencoders in modeling textual data. Traditional autoencoders suffer from at least two aspects: scalability with the high dimensionality of vocabulary size and dealing with task-irrelevant words. We address this problem by introducing supervision via the loss function of autoencoders. In particular, we first train a linear classifier on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, we define a posterior probability distribution on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. We show that our choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of our model. We evaluate the effectiveness of our model on six sentiment analysis datasets, and show that our model significantly outperforms all the competing methods with respect to classification accuracy. We also show that our model is able to take advantage of unlabeled dataset and get improved performance. We further show that our model successfully learns highly discriminative feature maps, which explains its superior performance.Comment: To appear in AAAI 201

    Combining NLP Approaches for Rule Extraction from Legal Documents

    Get PDF
    International audienceLegal texts express conditions in natural language describing what is permitted, forbidden or mandatory in the context they regulate. Despite the numerous approaches tackling the problem of moving from a natural language legal text to the respective set of machine-readable conditions, results are still unsatisfiable and it remains a major open challenge. In this paper, we propose a preliminary approach which combines different Natural Language Processing techniques towards the extraction of rules from legal documents. More precisely, we combine the linguistic information provided by WordNet together with a syntax-based extraction of rules from legal texts, and a logic-based extraction of dependencies between chunks of such texts. Such a combined approach leads to a powerful solution towards the extraction of machine-readable rules from legal documents. We evaluate the proposed approach over the Australian " Telecommunications consumer protections code "

    Multitask Learning with Deep Neural Networks for Community Question Answering

    Get PDF
    In this paper, we developed a deep neural network (DNN) that learns to solve simultaneously the three tasks of the cQA challenge proposed by the SemEval-2016 Task 3, i.e., question-comment similarity, question-question similarity and new question-comment similarity. The latter is the main task, which can exploit the previous two for achieving better results. Our DNN is trained jointly on all the three cQA tasks and learns to encode questions and comments into a single vector representation shared across the multiple tasks. The results on the official challenge test set show that our approach produces higher accuracy and faster convergence rates than the individual neural networks. Additionally, our method, which does not use any manual feature engineering, approaches the state of the art established with methods that make heavy use of it
    corecore