81 research outputs found

    Constructing a poor man’s wordnet in a resource-rich world

    Get PDF
    International audienceIn this paper we present a language-independent, fully modular and automatic approach to bootstrap a wordnet for a new language by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora, and Wikipedia. The approach, which we apply here to Slovene, takes into account monosemous and polysemous words, general and specialised vocabulary as well as simple and multi-word lexemes. The extracted words are then assigned one or several synset ids, based on a classifier that relies on several features including distributional similarity. Finally, we identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic, manual and task-based evaluations show that the resulting resource, the latest version of the Slovene wordnet, is already a valuable source of lexico-semantic information

    Making use of iron sensors for environmental applications

    No full text
    New sensors based on iron(II) coordination complexes are discussed in detail

    Mossbauer study on the gamma-radiolysis of tetralithium iron(III) trioxalate chloride nonahydrate

    No full text
    The final product of the gamma-radiolysis of tetralithium iron (III) trioxalate chloride nonahydrate has been identified by Mossbauer spectroscopy as FeC2O4 .2H2O. The radiolytic decomposition proceeds as a first-order process due to the original compound depletion and to the radiolytic stability of the ferrous compound. Chemical calibration of the relative peaks areas of the two iron species indicates that the corresponding f-factors ratio is unaffected by the radiolysis

    Taking corpus variability into account in keyword analysis

    No full text
    Most studies that make use of keyword analysis rely on the log-likelihood or the chi-square to extract words that are particularly characteristic of a corpus (e.g. Scott & Tribble 2006). These measures are computed on the basis of absolute frequencies and cannot account for the fact that "corpora are inherently variable internally" (Gries 2007). To overcome this limitation, measures of dispersion are sometimes used in combination with keyness values (e.g. Rayson 2003; Oakes & Farrow 2007). Some scholars have also suggested using other statistical measures (e.g. t-test, Wilcoxon's rank-sum test) but these techniques have not gained corpus linguists' favour (yet?). One possible explanation for this lack of enthusiasm is that their statistical added value has rarely been discussed in terms of 'linguistic' added value. To the authors' knowledge, there is not a single study comparing keywords extracted by means of different measures. In our presentation, we will report on a follow-up study to Paquot (2007), which made use of the log-likelihood and measures of range and dispersion to extract academic words and design a productively-oriented academic word list. We make use of the log-likelihood, the t-test and the Wilcoxon's rank-sum test in turn to compare the academic and the fiction sub-corpora of the 'British National Corpus' and extract words that are typical of academic discourse. We compare the three lists of academic keywords on a number of criteria (e.g. number of keywords extracted by each measure, percentage of keywords that are shared in the three lists, frequency and distribution of academic keywords in the two corpora) and explore the specificities of the three statistical measures. We also assess the advantages and disadvantages of these measures for the design of an academic wordlist
    corecore