2 research outputs found
Producing NLP-based On-line Contentware
For its internal needs as well as for commercial purposes, CDC Group has
produced several NLP-based on-line contentware applications for years. The
development process of such applications is subject to numerous constraints
such as quality of service, integration of new advances in NLP, direct
reactions from users, continuous versioning, short delivery deadlines and cost
control. Following this industrial and commercial experience, malleability of
the applications, their openness towards foreign components, efficiency of
applications and their ease of exploitation have appeared to be key points. In
this paper, we describe TalLab, a powerful architecture for on-line contentware
which fulfils these requirements.Comment: 7 pages, 5 figure
Using Learning-based Filters to Detect Rule-based Filtering Obsolescence
For years, Caisse des Depots et Consignations has produced information
filtering applications. To be operational, these applications require high
filtering performances which are achieved by using rule-based filters. With
this technique, an administrator has to tune a set of rules for each topic.
However, filters become obsolescent over time. The decrease of their
performances is due to diachronic polysemy of terms that involves a loss of
precision and to diachronic polymorphism of concepts that involves a loss of
recall.
To help the administrator to maintain his filters, we have developed a method
which automatically detects filtering obsolescence. It consists in making a
learning-based control filter using a set of documents which have already been
categorised as relevant or not relevant by the rule-based filter. The idea is
to supervise this filter by processing a differential comparison of its
outcomes with those of the control one.
This method has many advantages. It is simple to implement since the training
set used by the learning is supplied by the rule-based filter. Thus, both the
making and the use of the control filter are fully automatic. With automatic
detection of obsolescence, learning-based filtering finds a rich application
which offers interesting prospects.Comment: 13 pages, 12 figures, Content-based Multimedia Information Access,
RIAO 200