843 research outputs found

    Replicating web structure in small-scale test collections

    Get PDF
    Linkage analysis as an aid to web search has been assumed to be of significant benefit and we know that it is being implemented by many major Search Engines. Why then have few TREC participants been able to scientifically prove the benefits of linkage analysis in recent years? In this paper we put forward reasons why many disappointing results have been found in TREC experiments and we identify the linkage density requirements of a dataset to faithfully support experiments into linkage-based retrieval by examining the linkage structure of the WWW. Based on these requirements we report on methodologies for synthesising such a test collection

    Measures to Evaluate the Superiority of a Search Engine

    Get PDF
    Main objective of a search engine is to return relevant results according to user query in less time. Evaluation metrics are used to measure the superiority of a search engine in terms of quality. This is a review paper presenting a summary of different metrics used for evaluation of a search engine in terms of effectiveness, efficiency and relevancy

    Internet multimedia information retrieval based on link analysis.

    Get PDF
    Chan Ka Yan.Thesis (M.Phil.)--Chinese University of Hong Kong, 2004.Includes bibliographical references (leaves i-iv (3rd gp.)).Abstracts in English and Chinese.ACKNOWLEDGEMENT --- p.IABSTRACT --- p.II摘要 --- p.IVTABLE OF CONTENT --- p.VILIST OF FIGURE --- p.VIIILIST OF TABLE --- p.IXChapter CHAPTER 1. --- INTRODUCTION --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Importance of hyperlink analysis --- p.2Chapter CHAPTER 2. --- RELATED WORK --- p.4Chapter 2.1 --- Crawling --- p.4Chapter 2.1.1 --- Crawling method for HITS Algorithm --- p.4Chapter 2.1.2 --- Crawling method for Page Rank Algorithm --- p.7Chapter 2.2 --- Ranking --- p.7Chapter 2.2.1 --- Page Rank Algorithm --- p.8Chapter 2.2.2 --- HITS Algorithm --- p.11Chapter 2.2.3 --- PageRank-HITS Algorithm --- p.15Chapter 2.2.4 --- SALSA Algorithm --- p.16Chapter 2.2.5 --- Average and Sim --- p.18Chapter 2.2.6 --- Netscape Approach --- p.19Chapter 2.2.7 --- Cocitation Approach --- p.19Chapter 2.3 --- Multimedia Information Retrieval --- p.20Chapter 2.3.1 --- Octopus --- p.21Chapter CHAPTER 3. --- RESEARCH METHODOLOGY --- p.25Chapter 3.1 --- Research Objective --- p.25Chapter 3.2 --- Proposed Crawling Methodology --- p.26Chapter 3.2.1 --- Collecting Media Objects --- p.26Chapter 3.2.2 --- Filtering the collection of links --- p.29Chapter 3.3 --- Proposed Ranking Methodology --- p.34Chapter 3.3.1 --- Identifying the factors affect ranking --- p.34Chapter 3.3.2 --- Modified Ranking Algorithms --- p.37Chapter CHAPTER 4. --- EXPERIMENTAL RESULTS AND DISCUSSIONS --- p.52Chapter 4.1 --- Experimental Setup --- p.52Chapter 4.1.1 --- Assumptions for the Experiment --- p.53Chapter 4.2 --- Some Observations from Experiment --- p.54Chapter 4.2.1 --- Dangling links --- p.55Chapter 4.2.2 --- "Good Hub = bad Authority, Good Authority = bad Hub?" --- p.55Chapter 4.2.3 --- Setting of weights --- p.56Chapter 4.3 --- Discussion on Experimental Results --- p.57Chapter 4.3.1 --- Relevance --- p.57Chapter 4.3.2 --- Precision and recall --- p.58Chapter 4.3.3 --- Significance testing --- p.61Chapter 4.3.4 --- Ranking --- p.63Chapter 4.4 --- Limitations and Difficulties --- p.67Chapter 4.4.1 --- Small size of the base set --- p.68Chapter 4.4.2 --- Parameter settings --- p.68Chapter 4.4.3 --- Unable to remove all the meaningless links from base set --- p.68Chapter 4.4.4 --- Resources and time-consuming --- p.69Chapter 4.4.5 --- TKC Effect --- p.69Chapter 4.4.6 --- Continuously updated format of HTML codes and file types --- p.70Chapter 4.4.7 --- The object citation habit of authors --- p.70Chapter CHAPTER 5. --- CONCLUSION --- p.71Chapter 5.1 --- Contribution of our Methodology --- p.71Chapter 5.2 --- Possible Improvement --- p.71Chapter 5.3 --- Conclusion --- p.72BIBLIOGRAPHY --- p.IAPPENDIX --- p.A-IChapter A.1 --- One-tailed paired t-test results --- p.A-IChapter A2. --- Anova results --- p.A-I

    VAS (Visual Analysis System): An information visualization engine to interpret World Wide Web structure

    Get PDF
    People increasingly encounter problems of interpreting and filtering mass quantities of information. The enormous growth of information systems on the World Wide Web has demonstrated that we need systems to filter, interpret, organize and present information in ways that allow users to use these large quantities of information. People need to be able to extract knowledge from this sometimes meaningful but sometimes useless mass of data in order to make informed decisions. Web users need to have some kind of information about the sort of page they might visit, such as, is it a rarely referenced or often-referenced page? This master\u27s thesis presents a method to address these problems using data mining and information visualization techniques

    On the evolution of hyperlinking

    Get PDF
    Across time, the hyperlink object has supported different applications and studies. This is one perspective on the evolution of the hyperlinking concept, its context and related behaviors. Through a spectrum of hyperlinking applications and practices, the article contrasts the status quo with its related, broader, conceptual roots; it also bridges to some theorized and prototyped hyperlink variations, namely "stigmergic hyperlinks", to make the case that the ubiquitousness of some objects and certain usage patterns can obfuscate opportunities to (re)think them. In trying to contribute an answer to "what has the common hyperlink (such an apparently simple object) done to society, and what has society done to it?", the article identifies situations that have become so embedded in the daily routine, that it is now hard to think of hyperlinking alternatives.info:eu-repo/semantics/publishedVersio
    corecore