1 research outputs found

    Flexibility of multiword expressions and Corpus Pattern Analysis

    No full text
    <p>This chapter is set in the context of Corpus Pattern Analysis (CPA), a technique<br> developed by Patrick Hanks to map meaning onto word patterns found in corpora.<br> The main output of CPA is the Pattern Dictionary of English Verbs (PDEV), cur-<br> rently describing patterns for over 1,600 verbs, many of which are acknowledged to<br> be multiword expressions (MWEs) such as phrasal verbs or idioms. PDEV entries<br> are manually produced by lexicographers, based on the analysis of a substantial<br> sample of concordance lines from the corpus, so the construction of the resource<br> is very time-consuming. The motivation for the work presented in this chapter is<br> to speed up the discovery of these word patterns, using methods which can be<br> transferred to other languages. This chapter explores the benefits of a detailed con-<br> trastive analysis of MWEs found in English and French corpora with a view on<br> English-French translation. The comparative analysis is conducted through a case<br> study of the pair (bite, mordre), to illustrate both CPA and the application of sta-<br> tistical measures for the automatic extraction of MWEs. The approach taken in<br> this chapter takes its point of departure from the use of statistics developed ini-<br> tially by Church & Hanks (1989). Here we look at statistical measures which have<br> not yet been tested for their ability to discover new collocates, but are useful for<br> characterizing verbal MWEs already found. In particular we propose measures to<br> characterize the mean span, rigidity, diversity, and idiomaticity of a given MWE.</p> <p> </p
    corecore