The World Wide Web provides a huge distributed web database. However, information in the web database is free formatted and unorganized. Traditional keyword-based retrieval approaches are no longer appropriate. In this paper, we consider a framework for constructing agents that can simulate the behavior of human browsing on the Internet. Given a specific target, such an agent will make use of existing search engines to navigate through the web to locate the sites containing the target information and extract them into a database. We refer to these types of agents as Personal Navigating Agents (PNA). Since the information service is domain specific, we shall first focus on those PNA that can retrieve people’s information on the web in this paper. In this particular experiment, given the name of a university, we shall extract the following information about its faculty: name, telephone number, fax number, email address and URL. We explore web page knowledge in two ways: First, we develop a tagging system for each web page to facilitate information extraction. Our tagging system employs an HTML parser together with a natural language semantic tagger. These semantic tags are more general than part-of-speech tags used in linguistics. Second, we equip our PNA with a navigation map. A navigation map will guide our PNA to traverse through related pages and to arrive at pages containing the target information. In our experiments, our prototype agents have successfully explored a university web site and extracted target information with a very high accuracy. 1
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.