Search CORE

2 research outputs found

The effect of part-of-speech tagging on ir performance for Turkish

Author: Karaoglan B.
Taner Dinçer B.
Publication venue
Publication date: 01/01/2004
Field of study

In this paper, we experimentally evaluate the effect of the Part-of-Specch (POS) tagging on Information Retrieval performance for Turkish. We used four term-weighting schemas to index SABANCI-METU Turkish Treebank corpus. The term weighting schemas are "tf", "tf × idf", "Ltu.Itu", and "Okapi". Each weighting scheme is factored over three POS tagging cases that are namely "No POS tagging", "POS tag with no history (i.e. 1-gram)", and "POS tag with one step history (i.e. 2-gram)". The Meta-scoring function is used to analyze the effect of these nine factors on IR performance. Results show that weighting schema are significantly different from each other with a p-value of 0.04 (Friedman Non-parametric Test), but there is not enough evidence in the corpus to reject the null hypothesis that the three weighting schemas, on the average, show equal performance over the three cases of POS tagging with a p-value of 0.36. © Springer-Verlag 2004

Ege University Institutional Repository

The effect of part-of-speech tagging on IR performance for Turkish

Author: Dincer BT
Karaoglan B
Publication venue: Springer-Verlag Berlin
Publication date: 01/01/2004
Field of study

19th International Symposium on Computer and Information Sciences (ISCIS 2004) -- OCT 27-29, 2004 -- Kemer Antalya, TURKEYWOS: 000225096700077In this paper, we experimentally evaluate the effect of the Part-of-Speech (POS) tagging on Information Retrieval performance for Turkish. We used four term-weighting schemas to index SABANCI-METU Turkish Treebank corpus. The term weighting schemas are "tf", "tf x idf", "Ltu.ltu", and "Okapi". Each weighting scheme is factored over three POS tagging cases that are namely "No POS tagging", "POS tag with no history (i.e. 1-gram)", and "POS tag with one step history (i.e. 2-gram)". The Meta-scoring function is used to analyze the effect of these nine factors on IR performance. Results show that weighting schema are significantly different from each other with a p-value of 0.04 (Friedman Non-parametric Test), but there is not enough evidence in the corpus to reject the null hypothesis that the three weighting schemas, on the average, show equal performance over the three cases of POS tagging with a p-value of 0.36.Bilkent Univ, Dept Comp Engn, Inst Elect & Elect Engineers Turkey Sect, Working Grp, Int Federat Informat Proc, Sci & Tech Res Council Turke

Ege University Institutional Repository