This paper introduces a named entity recognition approach in textual corpus.
This Named Entity (NE) can be a named: location, person, organization, date,
time, etc., characterized by instances. A NE is found in texts accompanied by
contexts: words that are left or right of the NE. The work mainly aims at
identifying contexts inducing the NE's nature. As such, The occurrence of the
word "President" in a text, means that this word or context may be followed by
the name of a president as President "Obama". Likewise, a word preceded by the
string "footballer" induces that this is the name of a footballer. NE
recognition may be viewed as a classification method, where every word is
assigned to a NE class, regarding the context. The aim of this study is then to
identify and classify the contexts that are most relevant to recognize a NE,
those which are frequently found with the NE. A learning approach using
training corpus: web documents, constructed from learning examples is then
suggested. Frequency representations and modified tf-idf representations are
used to calculate the context weights associated to context frequency, learning
example frequency, and document frequency in the corpus.Comment: 11 pages 4 figures, 2 table