1 research outputs found
Mining Web Pages Using Features of Rendering HTML Elements in the Web Browser
The Web is the largest repository of useful information available for
human users, but it is usual that Web Pages do not provide an API to get access to
its information automatically. In order to solve this problem, Information
Extractors are developed. We present a new methodology to induce Information
Extractors from the Web. It is based on rendering HTML elements in the Web
browser. The methodology uses a KDD process to mining a dataset with features
of the elements in the Web page. An experimentation over 10 web sites has been
made and the results show the effectiveness of the methodology.Ministerio de Ciencia y Tecnología TIN2007-64119Junta de Andalucía P07-TIC-02602Junta de Andalucía P08-TIC-410