131,215 research outputs found
An open source rule induction tool for transfer-based SMT
In this paper we describe an open source tool for automatic induction of transfer rules. Transfer rule induction is carried out on pairs of dependency structures and their node alignment to produce all rules consistent with the node alignment. We describe an efficient algorithm for rule induction and give a detailed description of how to use the tool
System combination with extra alignment information
This paper provides the system description of the IHMM team of Dublin City University for our participation in the system combination task in the Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12). Our work is based on a confusion network-based approach to system combination. We propose a new method to build a confusion network for this: (1) incorporate extra alignment information extracted from given meta data, treating them as sure alignments, into the results from IHMM, and (2) decode together with this information. We also heuristically set one of the system outputs as the default backbone. Our results show that this backbone, which is the RBMT system output, achieves an 0.11% improvement in BLEU over the backbone chosen by TER, while the extra information we added in the decoding part does not improve the results
Integrating Semantic Knowledge to Tackle Zero-shot Text Classification
Insufficient or even unavailable training data of emerging classes is a big
challenge of many classification tasks, including text classification.
Recognising text documents of classes that have never been seen in the learning
stage, so-called zero-shot text classification, is therefore difficult and only
limited previous works tackled this problem. In this paper, we propose a
two-phase framework together with data augmentation and feature augmentation to
solve this problem. Four kinds of semantic knowledge (word embeddings, class
descriptions, class hierarchy, and a general knowledge graph) are incorporated
into the proposed framework to deal with instances of unseen classes
effectively. Experimental results show that each and the combination of the two
phases achieve the best overall accuracy compared with baselines and recent
approaches in classifying real-world texts under the zero-shot scenario.Comment: Accepted NAACL-HLT 201
- âŠ