From Wrapping to Knowledge

Abstract

One the most challenging problems for Enterprise Information Integration is to deal with heterogeneous information sources on the Web. The reason is that they usually provide information that is in human-readable form only, which makes it difficult for a software agent to understand it. Current solutions build on the idea of annotating the information with semantics. If the information is unstructured, proposals such as S-CREAM, MnM, or Armadillo may be effective enough since they rely on using natural language processing techniques; furthermore, their accuracy can be improved by using redundant information on the Web, as C-PANKOW has proved recently. If the information is structured and closely related to a back-end database, Deep Annotation ranges among the most effective proposals, but it requires the information providers to modify their applications; if Deep Annotation is not applicable, the easiest solution consists of using a wrapper and transforming its output into annotations. In this paper, we prove that this transformation can be automated by means of an efficient, domain-independent algorithm. To the best of our knowledge, this is the first attempt to devise and formalize such a systematic, general solution.Comisión Interministerial de Ciencia y Tecnología TIC2003-02737-C02-0

    Similar works