768 research outputs found

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    Hypothesis Testing based Intrinsic Evaluation of Word Embeddings

    Full text link
    We introduce the cross-match test - an exact, distribution free, high-dimensional hypothesis test as an intrinsic evaluation metric for word embeddings. We show that cross-match is an effective means of measuring distributional similarity between different vector representations and of evaluating the statistical significance of different vector embedding models. Additionally, we find that cross-match can be used to provide a quantitative measure of linguistic similarity for selecting bridge languages for machine translation. We demonstrate that the results of the hypothesis test align with our expectations and note that the framework of two sample hypothesis testing is not limited to word embeddings and can be extended to all vector representations.Comment: Accepted to RepEval 2017: The Second Workshop on Evaluating Vector Space Representations for NL

    ELF and Translation as Language Contact

    Get PDF
    This paper explores multilingual language contact in seemingly unrelated settings: translation and English as a lingua franca, also touching on learner language. By delving into similar processes in these settings at three levels – the macro level of a language as a whole, the intermediate level of social interaction, and the micro level of cognition – it argues that translation and ELF are sites of multilingual contact resulting in a degree of hybridization in the languages involved, and thereby important drivers of language change. It is suggested that macro-level similarities in translation and ELF, such as the relative overrepresentation of high-frequency items and structures and untypical multi-word combinations ensue from interactional and cognitive processes where one fundamental mechanism is priming. Translations engage in cross-linguistic textual priming, while users of ELF interact with other ‘similects’ in complex second-order language contact. Both can contribute crucially to understanding processes of change and contact-induced variation.Peer reviewe
    • …
    corecore