65,300 research outputs found

    Application of ARIMA(1,1,0) Model for Predicting Time Delay of Search Engine Crawlers

    Get PDF
    World Wide Web is growing at a tremendous rate in terms of the number of visitors and number of web pages. Search engine crawlers are highly automated programs that periodically visit the web and index web pages. The behavior of search engines could be used in analyzing server load, quality of search engines, dynamics of search engine crawlers, ethics of search engines etc. The more the number of visits of a crawler to a web site, the more it contributes to the workload. The time delay between two consecutive visits of a crawler determines the dynamicity of the crawlers. The ARIMA(1,1,0) Model in time series analysis works well with the forecasting of the time delay between the visits of search crawlers at web sites. We considered 5 search engine crawlers, all of which could be modeled using ARIMA(1,1,0).The results of this study is useful in analyzing the server load

    The Measurement of Intellectual Influence

    Get PDF
    We examine the problem of measuring influence based on the information contained in the data on the communications between scholarly publications, judicial decisions, patents, web pages, and other entities. The measurement of influence is useful to address several empirical questions such as reputation, prestige, aspects of the diffusion of knowledge, the markets for scientists and scientific publications, the dynamics of innovation, ranking algorithms of search engines in the World Wide Web, and others. In this paper we ask why any given methodology is reasonable and informative applying the axiomatic method. We find that a unique ranking method can be characterized by means of five axioms: anonymity, invariance to citation intensity, weak homogeneity, weak consistency, and invariance to splitting of journals. This method is easily implementable and turns out to be different from those regularly used in social and natural sciences, arts and humanities, and computer science.Intellectual Influence, Citations, Ranking Methods, Consistency.

    Time-Sensitive User Profile for Optimizing Search Personlization

    Get PDF
    International audienceThanks to social Web services, Web search engines have the opportunity to afford personalized search results that better fit the user’s information needs and interests. To achieve this goal, many personalized search approaches explore user’s social Web interactions to extract his preferences and interests, and use them to model his profile. In our approach, the user profile is implicitly represented as a vector of weighted terms which correspond to the user’s interests extracted from his online social activities. As the user interests may change over time, we propose to weight profiles terms not only according to the content of these activities but also by considering the freshness. More precisely, the weights are adjusted with a temporal feature. In order to evaluate our approach, we model the user profile according to data collected from Twitter. Then, we rerank initial search results accurately to the user profile. Moreover, we proved the significance of adding a temporal feature by comparing our method with baselines models that does not consider the user profile dynamics

    Mining Web Dynamics for Search

    Get PDF
    Billions of web users collectively contribute to a dynamic web that preserves how information sources and descriptions change over time. This dynamic process sheds light on the quality of web content, and even indicates the temporal properties of information needs expressed via queries. However, existing commercial search engines typically utilize one crawl of web content (the latest) without considering the complementary information concealed in web dynamics. As a result, the generated rankings may be biased due to the efficiency of knowledge on page or hyperlink evolution, and the time-sensitive facet within search quality, e.g., freshness, has to be neglected. While previous research efforts have been focused on exploring the temporal dimension in retrieval process, few of them showed consistent improvements on large-scale real-world archival web corpus with a broad time span.We investigate how to utilize the changes of web pages and hyperlinks to improve search quality, in terms of freshness and relevance of search results. Three applications that I have focused on are: (1) document representation, in which the anchortext (short descriptive text associated with hyperlinks) importance is estimated by considering its historical status; (2) web authority estimation, in which web freshness is quantified and utilized for controlling the authority propagation; and (3) learning to rank, in which freshness and relevance are optimized simultaneously in an adaptive way depending on query type. The contributions of this thesis are: (1) incorporate web dynamics information into critical components within search infrastructure in a principled way; and (2) empirically verify the proposed methods by conducting experiments based on (or depending on) a large-scale real-world archival web corpus, and demonstrated their superiority over existing state-of-the-art

    A three-year study on the freshness of Web search engine databases

    Get PDF
    This paper deals with one aspect of the index quality of search engines: index freshness. The purpose is to analyse the update strategies of the major Web search engines Google, Yahoo, and MSN/Live.com. We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another

    The egalitarian effect of search engines

    Full text link
    Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. We show that, contrary to these prior claims and our own intuition, the use of search engines actually has an egalitarian effect. We reconcile theoretical arguments with empirical evidence showing that the combination of retrieval by search engines and search behavior by users mitigates the attraction of popular pages, directing more traffic toward less popular sites, even in comparison to what would be expected from users randomly surfing the Web.Comment: 9 pages, 8 figures, 2 appendices. The final version of this e-print has been published on the Proc. Natl. Acad. Sci. USA 103(34), 12684-12689 (2006), http://www.pnas.org/cgi/content/abstract/103/34/1268
    • …
    corecore