Much useful e-commerce information is available on web pages, especially those created by queries to web servers. The problem for programs to use that information is how to ‘screen-scrape’ the data off the web page into machineusable data structures. Wrappers for web data sources use knowledge of the page layout in order to extract data accurately. So they fail if page format
changes. This paper describes a fast method for wrapper production and also a method to automatically detect page format change, before it causes data access to fail. The method works for pages that contain collections of items, such as lists, tables and hierarchical structures. It uses a representation of html documents, which makes repetitive features apparent. This provides fully automatic wrapper production for a class of web pages, and rapid interactive
production for others