Skip to main content
Article thumbnail
Location of Repository

Intelligent Self-Repairable Web Wrappers

By Emilio Ferrara and Robert Baumgartner

Abstract

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources -- the so called Web wrappers -- which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.\u

Topics: Artificial Intelligence
Year: 2011
OAI identifier: oai:cogprints.org:7666
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://cogprints.org/7666/1/pa... (external link)
  • http://cogprints.org/7666/ (external link)
  • Suggested articles

    Citations

    1. (2000). A machine learning approach to web mining.
    2. (2005). A survey on tree edit distance and related problems. Theoretical computer science 337(1-3),
    3. (2003). Automatic repairing of web wrappers by combining redundant views. In:
    4. (2011). Automatic wrapper adaptation by tree edit distance matching.
    5. (2011). Design of automatically adaptable web wrappers. In:
    6. (2009). Scalable web data extraction for online market intelligence.
    7. (2009). Web data extraction system,
    8. (2011). Web data extraction, application and techniques: A survey.
    9. (2008). Web information extraction by HTML tree edit distance matching. In:
    10. (2000). Wrapper verification.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.