1 research outputs found
Flexibility of multiword expressions and Corpus Pattern Analysis
<p>This chapter is set in the context of Corpus Pattern Analysis (CPA), a technique<br>
developed by Patrick Hanks to map meaning onto word patterns found in corpora.<br>
The main output of CPA is the Pattern Dictionary of English Verbs (PDEV), cur-<br>
rently describing patterns for over 1,600 verbs, many of which are acknowledged to<br>
be multiword expressions (MWEs) such as phrasal verbs or idioms. PDEV entries<br>
are manually produced by lexicographers, based on the analysis of a substantial<br>
sample of concordance lines from the corpus, so the construction of the resource<br>
is very time-consuming. The motivation for the work presented in this chapter is<br>
to speed up the discovery of these word patterns, using methods which can be<br>
transferred to other languages. This chapter explores the benefits of a detailed con-<br>
trastive analysis of MWEs found in English and French corpora with a view on<br>
English-French translation. The comparative analysis is conducted through a case<br>
study of the pair (bite, mordre), to illustrate both CPA and the application of sta-<br>
tistical measures for the automatic extraction of MWEs. The approach taken in<br>
this chapter takes its point of departure from the use of statistics developed ini-<br>
tially by Church & Hanks (1989). Here we look at statistical measures which have<br>
not yet been tested for their ability to discover new collocates, but are useful for<br>
characterizing verbal MWEs already found. In particular we propose measures to<br>
characterize the mean span, rigidity, diversity, and idiomaticity of a given MWE.</p>
<p>Â </p