1,011 research outputs found

    Intelligent spider for Internet searching

    Get PDF
    As World Wide Web (WWW) based Internet services become more popular, information overload also becomes a pressing research problem. Difficulties with searching on the Internet get worse as the amount of information that is available increases. A scalable approach to support Internet search is critical to the success of Internet services and other current or future national information infrastructure (NII) applications. A new approach to build an intelligent personal spider (agent), which is based on automatic textual analysis of Internet documents, is proposed. Best first search and genetic algorithm have been tested to develop the intelligent spider. These personal spiders are able to dynamically and intelligently analyze the contents of the users' selected homepages as the starting point to search for the most relevant homepages based on the links and indexing. An intelligent spider must have the capability to make adjustments according to progress of searching in order to be an intelligent agent. However, the current searching engines do not have communication between the users and the robots. The spider presented in the paper uses Java to develop the user interface such that the users can adjust the control parameters according to the progress and observe the intermediate results. The performances of the genetic algorithm based and best first search based spiders are also reported.published_or_final_versio

    A Survey on Important Aspects of Information Retrieval

    Get PDF
    Information retrieval has become an important field of study and research under computer science due to the explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive survey discussing not only the emergence and evolution of information retrieval but also include different information retrieval models and some important aspects such as document representation, similarity measure and query expansion

    A smart itsy bitsy spider for the Web

    Get PDF
    Artificial Intelligence Lab, Department of MIS, University of ArizonaAs part of the ongoing Illinois Digital Library Initiative project, this research proposes an intelligent agent approach to Web searching. In this experiment, we developed two Web personal spiders based on best first search and genetic algorithm techniques, respectively. These personal spiders can dynamically take a userâ s selected starting homepages and search for the most closely related homepages in the Web, based on the links and keyword indexing. A graphical, dynamic, Java-based interface was developed and is available for Web access. A system architecture for implementing such an agent-based spider is presented, followed by detailed discussions of benchmark testing and user evaluation results. In benchmark testing, although the genetic algorithm spider did not outperform the best first search spider, we found both results to be comparable and complementary. In user evaluation, the genetic algorithm spider obtained significantly higher recall value than that of the best first search spider. However, their precision values were not statistically different. The mutation process introduced in genetic algorithm allows users to find other potential relevant homepages that cannot be explored via a conventional local search process. In addition, we found the Java-based interface to be a necessary component for design of a truly interactive and dynamic Web agent

    Variation In Greedy Approach To Set Covering Problem

    Get PDF
    The weighted set covering problem is to choose a number of subsets to cover all the elements in a universal set at the lowest cost. It is a well-studied classical problem with applications in various fields like machine learning, planning, information retrieval, facility allocation, etc. Deep web crawling refers to the process of gathering documents that have been structured into a data source and can be retrieved through a search interface. Its query selection process calls for an efficient solution to the set covering problem

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Automatic Scientific Literature Classification Using Multiple Information Sources for Data Mining Purposes

    Get PDF
    • …
    corecore