2,268 research outputs found

    Human Associations Help to Detect Conventionalized Multiword Expressions

    Full text link
    In this paper we show that if we want to obtain human evidence about conventionalization of some phrases, we should ask native speakers about associations they have to a given phrase and its component words. We have shown that if component words of a phrase have each other as frequent associations, then this phrase can be considered as conventionalized. Another type of conventionalized phrases can be revealed using two factors: low entropy of phrase associations and low intersection of component word and phrase associations. The association experiments were performed for the Russian language

    Decreasing lexical data sparsity in statistical syntactic parsing - experiments with named entities

    Get PDF
    In this paper we present preliminary experiments that aim to reduce lexical data sparsity in statistical parsing by exploiting information about named entities. Words in the WSJ corpus are mapped to named entity clusters and a latent variable constituency parser is trained and tested on the transformed corpus. We explore two different methods for mapping words to entities, and look at the effect of mapping various subsets of named entity types. Thus far, results show no improvement in parsing accuracy over the best baseline score; we identify possible problems and outline suggestions for future directions

    ALANZ handbook 2018

    Get PDF
    Co-edited Handbook for participants at December ALANZ Symposiu

    ALANZ 2018

    Get PDF
    1st December 2018 Waikato Institute of Technology (Wintec) Hamilton We are pleased to announce that the Call for Papers for the ALANZ SYMPOSIUM 2018 is now open. We invite proposals for paper presentations, interactive sessions and posters. The landscape of English language teaching is constantly changing and as teachers contemplate new cohorts of learners, they face this question: Is business as usual enough? In today’s settings there are new technologies to incorporate into learning and teaching, different teaching spaces becoming available, a need to balance fostering learner autonomy with the pastoral care of students, as well as ensuring that our teaching is relevant to the world our students face. We would like to adopt a collegial approach to this question and so invite abstracts from members and non-members of ALANZ and in particular from new and emerging researchers. Presentation types: * Oral Presentations: These will be allocated 20 minutes and 5 minutes for questions (25 minutes total) usually supported with visual aids. * Interactive sessions: These could be workshops or informal discussions around points of interest in Applied Linguistics (45 minutes) and could be supported by visual aids or activities. * Posters: Often some research projects can be best presented in a visual manner in the form of a poster. Abstracts (250 words max.) can be submitted to one of two committee members: * Anthea Fester email: [email protected] or * Celine Kearney email: [email protected] Deadline for abstract submission: 7th September 2018 Notification of acceptance: 28th September 201

    ALANZ handbook

    Get PDF
    Co-edited Handbook for participants at December ALANZ Symposiu

    ALANZ handbook

    Get PDF
    Co-edited Handbook for participants at December ALANZ Symposiu

    Marrying Universal Dependencies and Universal Morphology

    Full text link
    The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. With compatibility of tags, each project's annotations could be used to validate the other's. Additionally, the availability of both type- and token-level resources would be a boon to tasks such as parsing and homograph disambiguation. To ease this interoperability, we present a deterministic mapping from Universal Dependencies v2 features into the UniMorph schema. We validate our approach by lookup in the UniMorph corpora and find a macro-average of 64.13% recall. We also note incompatibilities due to paucity of data on either side. Finally, we present a critical evaluation of the foundations, strengths, and weaknesses of the two annotation projects.Comment: UDW1
    corecore