Skip to main content
Article thumbnail
Location of Repository

Automatic Wrapper Adaptation by Tree Edit Distance Matching

By Emilio Ferrara and Robert Baumgartner

Abstract

Information distributed through the Web keeps growing faster day by day,\ud and for this reason, several techniques for extracting Web data have been suggested\ud during last years. Often, extraction tasks are performed through so called wrappers,\ud procedures extracting information from Web pages, e.g. implementing logic-based\ud techniques. Many fields of application today require a strong degree of robustness\ud of wrappers, in order not to compromise assets of information or reliability of data\ud extracted.\ud Unfortunately, wrappers may fail in the task of extracting data from a Web page, if\ud its structure changes, sometimes even slightly, thus requiring the exploiting of new\ud techniques to be automatically held so as to adapt the wrapper to the new structure\ud of the page, in case of failure. In this work we present a novel approach of automatic wrapper adaptation based on the measurement of similarity of trees through\ud improved tree edit distance matching techniques

Topics: Artificial Intelligence
Year: 2010
OAI identifier: oai:cogprints.org:7642
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://cogprints.org/7642/1/co... (external link)
  • http://cogprints.org/7642/ (external link)
  • Suggested articles

    Citations

    1. (1975). A linear space algorithm for computing maximal common subsequences.
    2. (2005). A survey on tree edit distance and related problems.
    3. (2001). Automatic repairing of web wrappers. In:
    4. (1998). Computing the edit-distance between unrooted ordered trees. In: Algorithms — ESA’ 98,
    5. (2011). Design of automatically adaptable web wrappers. In:
    6. (2002). JS: A brief survey of web data extraction tools.
    7. (2006). MyPortal: robust extraction and aggregation of web content. In:
    8. (2010). Web Data Extraction, Applications and Techniques: A Survey.
    9. Web Information Extraction by HTML Tree Edit Distance Matching. In:
    10. (2003). Wrapper maintenance: A machine learning approach.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.