47,874 research outputs found

    Information Retrieval Models

    Get PDF
    Many applications that handle information on the internet would be completely\ud inadequate without the support of information retrieval technology. How would\ud we find information on the world wide web if there were no web search engines?\ud How would we manage our email without spam filtering? Much of the development\ud of information retrieval technology, such as web search engines and spam\ud filters, requires a combination of experimentation and theory. Experimentation\ud and rigorous empirical testing are needed to keep up with increasing volumes of\ud web pages and emails. Furthermore, experimentation and constant adaptation\ud of technology is needed in practice to counteract the effects of people that deliberately\ud try to manipulate the technology, such as email spammers. However,\ud if experimentation is not guided by theory, engineering becomes trial and error.\ud New problems and challenges for information retrieval come up constantly.\ud They cannot possibly be solved by trial and error alone. So, what is the theory\ud of information retrieval?\ud There is not one convincing answer to this question. There are many theories,\ud here called formal models, and each model is helpful for the development of\ud some information retrieval tools, but not so helpful for the development others.\ud In order to understand information retrieval, it is essential to learn about these\ud retrieval models. In this chapter, some of the most important retrieval models\ud are gathered and explained in a tutorial style

    Performance Evaluation of Selected Search Engines

    Get PDF
    Search Engines have become an integral part of daily internet usage. The search engine is the first stop for web users when they are looking for a product. Information retrieval may be viewed as a problem of classifying items into one of two classes corresponding to interesting and uninteresting items respectively. A natural performance metric in this context is classification accuracy, defined as the fraction of the system's interesting/uninteresting predictions that agree with the user's assessments. On the other hand, the field of information retrieval has two classical performance evaluation metrics: precision, the fraction of the items retrieved by the system that are interesting to the user, and recall, the fraction of the items of interest to the user that are retrieved by the system. Measuring the information retrieval effectiveness of World Wide Web search engines is costly because of human relevance judgments involved. However, both for business enterprises and people it is important to know the most effective Web search engines, since such search engines help their users find higher number of relevant Web pages with less effort. Furthermore, this information can be used for several practical purposes. This study evaluates the performance of three Web search engines. A set of measurements is proposed for evaluating Web search engine performance

    A Relation-Based Page Rank Algorithm for Semantic Web Search Engines

    Get PDF
    With the tremendous growth of information available to end users through the Web, search engines come to play ever a more critical role. Nevertheless, because of their general-purpose approach, it is always less uncommon that obtained result sets provide a burden of useless pages. The next-generation Web architecture, represented by the Semantic Web, provides the layered architecture possibly allowing overcoming this limitation. Several search engines have been proposed, which allow increasing information retrieval accuracy by exploiting a key content of Semantic Web resources, that is, relations. However, in order to rank results, most of the existing solutions need to work on the whole annotated knowledge base. In this paper, we propose a relation-based page rank algorithm to be used in conjunction with Semantic Web search engines that simply relies on information that could be extracted from user queries and on annotated resources. Relevance is measured as the probability that a retrieved resource actually contains those relations whose existence was assumed by the user at the time of query definitio

    Special requirements for comparative evaluation of web search engines

    Get PDF
    ABSTRACT: Performance evaluation of classical information retrieval systems usually aims to assess the ability of these systems to find documents considered relevant to a certain search query based on a specific evaluation criteria. This approach, however, is not suitable to adequately evaluate some information retrieval applications such as web search engines. The web special characteristics make information retrieval tasks and the evaluation of search engines on the web face multiple challenges. Different web-specific, user-specific and language-specific requirements should be considered when designing and performing evaluation tests on operational web search engines. This paper discusses the special requirements for comprehensive comparative evaluation of different web search engines and highlights some languagespecific considerations for evaluation in Arabic language

    The State-of-the-arts in Focused Search

    Get PDF
    The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

    An Integrated Information Retrieval Framework for Managing the Digital Web Ecosystem

    Get PDF
    The information explosion makes the digital Web ecosystem exploration, as a valid web search tool challenging for retrieving relevant information and knowledge. The existing tools are not integrated, and search results are not well managed. In this article, we describe effective information retrieval services for users and agents in various digital ecosystem scenarios. A novel integrated information retrieval framework (IIRF) is proposed, which employs the Web search technologies and traditional database searching techniques to provide comprehensive, dynamic, personalized, and organization-oriented information retrieval services, ranging from the Internet, intranet, to personal desktop. Experiments are carried out demonstrating the improvements in the search process with an average precision of Web search results to standard 11 recall level, attaining improvement from 41.7% of a comparable system to 65.2% of search. A 23.5% precision improvement is achieved with the framework. The comparison made among search engines presents a similar development with satisfactory search results

    Enhanced Trustworthy and High-Quality Information Retrieval System for Web Search Engines

    Get PDF
    The WWW is the most important source of information. But, there is no guarantee for information correctness and lots of conflicting information is retrieved by the search engines and the quality of provided information also varies from low quality to high quality. We provide enhanced trustworthiness in both specific (entity) and broad (content) queries in web searching. The filtering of trustworthiness is based on 5 factors – Provenance, Authority, Age, Popularity, and Related Links. The trustworthiness is calculated based on these 5 factors and it is stored thereby increasing the performance in retrieving trustworthy websites. The calculated trustworthiness is stored only for static websites. Quality is provided based on policies selected by the user. Quality based ranking of retrieved trusted information is provided using WIQA (Web Information Quality Assessment) Framework
    corecore