187,026 research outputs found

    Smart Search: A Firefox Add-On to Compute a Web Traffic Ranking

    Get PDF
    Search engines results are typically ordered according to some notion of importance of a web page as well as relevance of the content of a web page to a query. Web page importance is usually calculated based on some graph theoretic properties of the web. Another common technique to measure page importance is to make use of the traffic that goes to a particular web page as measured by a browser toolbar. Currently, there are some traffic ranking tools available like www.alexa.com, www.ranking.com, www.compete.com that give such analytic as to the number of users who visit a web site. Alexa provides the traffic rank for a website based on two factors: The number of users that view a website and the number of pages viewed. The Alexa toolbar is not open-source.The main goal of our project was to create a Smart Search Firefox add-on for the Yioop search engine, an open source search engine developed by my project advisor, Dr. Chris Pollett. This add-on would provide similar analytic data to the Yioop search engine, but in a transparent and open-source way. With the results received from the Smart Search toolbar extension, the Yioop search engine refines the search results as well as provides user centric-search results. Eventually, users would benefit from these better search results

    Recent development in XML-IR

    Get PDF
    The Web is characterized by a huge amount of heterogeneous data sources, which have different media support and format representation. Because XML can represent files of different formats, it can play an important role in IR since it is becoming a standard form for data representation and exchange over the Web. Under this assumption, the problem of querying heterogeneous sources can be reduced to the problem of querying XML data sources. This paper shows the influence of XML on the IR techniques and methodologies during the last five years through serving over 400 papers published in different conferences and journals

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    Improving the evaluation of web search systems

    Get PDF
    Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis over the past three years? In this paper we put forward reasons why disappointing results have been found and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage analysis. We also report a series of linkage-based retrieval experiments on a more densely linked dataset culled from the TREC web documents

    Collaborative assessment of information provider's reliability and expertise using subjective logic

    Get PDF
    Q&A social media have gained a lot of attention during the recent years. People rely on these sites to obtain information due to a number of advantages they offer as compared to conventional sources of knowledge (e.g., asynchronous and convenient access). However, for the same question one may find highly contradicting answers, causing an ambiguity with respect to the correct information. This can be attributed to the presence of unreliable and/or non-expert users. These two attributes (reliability and expertise) significantly affect the quality of the answer/information provided. We present a novel approach for estimating these user's characteristics relying on human cognitive traits. In brief, we propose each user to monitor the activity of her peers (on the basis of responses to questions asked by her) and observe their compliance with predefined cognitive models. These observations lead to local assessments that can be further fused to obtain a reliability and expertise consensus for every other user in the social network (SN). For the aggregation part we use subjective logic. To the best of our knowledge this is the first study of this kind in the context of Q&A SN. Our proposed approach is highly distributed; each user can individually estimate the expertise and the reliability of her peers using her direct interactions with them and our framework. The online SN (OSN), which can be considered as a distributed database, performs continuous data aggregation for users expertise and reliability assessment in order to reach a consensus. We emulate a Q&A SN to examine various performance aspects of our algorithm (e.g., convergence time, responsiveness etc.). Our evaluations indicate that it can accurately assess the reliability and the expertise of a user with a small number of samples and can successfully react to the latter's behavior change, provided that the cognitive traits hold in practice. © 2011 ICST

    HELIN Consortium OPAC Report

    Get PDF
    Report of the HELIN OPAC Task Force, a group appointed by the HELIN Directors to review the Innovative Interfaces online public access catalog under 2006 release driven by WebPAC Pro. The task force met during the fall of 2006
    corecore