5,000 research outputs found

    Universal Dependencies Parsing for Colloquial Singaporean English

    Full text link
    Singlish can be interesting to the ACL community both linguistically as a major creole based on English, and computationally for information extraction and sentiment analysis of regional social media. We investigate dependency parsing of Singlish by constructing a dependency treebank under the Universal Dependencies scheme, and then training a neural network model by integrating English syntactic knowledge into a state-of-the-art parser trained on the Singlish treebank. Results show that English knowledge can lead to 25% relative error reduction, resulting in a parser of 84.47% accuracies. To the best of our knowledge, we are the first to use neural stacking to improve cross-lingual dependency parsing on low-resource languages. We make both our annotation and parser available for further research.Comment: Accepted by ACL 201

    Irish treebanking and parsing: a preliminary evaluation

    Get PDF
    Language resources are essential for linguistic research and the development of NLP applications. Low- density languages, such as Irish, therefore lack significant research in this area. This paper describes the early stages in the development of new language resources for Irish – namely the first Irish dependency treebank and the first Irish statistical dependency parser. We present the methodology behind building our new treebank and the steps we take to leverage upon the few existing resources. We discuss language specific choices made when defining our dependency labelling scheme, and describe interesting Irish language characteristics such as prepositional attachment, copula and clefting. We manually develop a small treebank of 300 sentences based on an existing POS-tagged corpus and report an inter-annotator agreement of 0.7902. We train MaltParser to achieve preliminary parsing results for Irish and describe a bootstrapping approach for further stages of development

    Deep Multitask Learning for Semantic Dependency Parsing

    Full text link
    We present a deep neural architecture that parses sentences into three semantic dependency graph formalisms. By using efficient, nearly arc-factored inference and a bidirectional-LSTM composed with a multi-layer perceptron, our base system is able to significantly improve the state of the art for semantic dependency parsing, without using hand-engineered features or syntax. We then explore two multitask learning approaches---one that shares parameters across formalisms, and one that uses higher-order structures to predict the graphs jointly. We find that both approaches improve performance across formalisms on average, achieving a new state of the art. Our code is open-source and available at https://github.com/Noahs-ARK/NeurboParser.Comment: Proceedings of ACL 201

    An analysis of question processing of English and Chinese for the NTCIR 5 cross-language question answering task

    Get PDF
    An important element in question answering systems is the analysis and interpretation of questions. Using the NTCIR 5 Cross-Language Question Answering (CLQA) question test set we demonstrate that the accuracy of deep question analysis is dependent on the quantity and suitability of the available linguistic resources. We further demonstrate that applying question analysis tools developed on monolingual training materials to questions translated Chinese-English and English-Chinese using machine translation produces much reduced effectiveness in interpretation of the question. This latter result indicates that question analysis for CLQA should primarily be conducted in the question language prior to translation

    Strong domain variation and treebank-induced LFG resources

    Get PDF
    In this paper we present a number of experiments to test the portability of existing treebank induced LFG resources. We test the LFG parsing resources of Cahill et al. (2004) on the ATIS corpus which represents a considerably different domain to the Penn-II Treebank Wall Street Journal sections, from which the resources were induced. This testing shows an under-performance at both c- and f-structure level as a result of the domain variation. We show that in order to adapt the LFG resources of Cahill et al. (2004) to this new domain, all that is necessary is to retrain the c-structure parser on data from the new domain
    corecore