47,841 research outputs found

    The egalitarian effect of search engines

    Full text link
    Search engines have become key media for our scientific, economic, and social activities by enabling people to access information on the Web in spite of its size and complexity. On the down side, search engines bias the traffic of users according to their page-ranking strategies, and some have argued that they create a vicious cycle that amplifies the dominance of established and already popular sites. We show that, contrary to these prior claims and our own intuition, the use of search engines actually has an egalitarian effect. We reconcile theoretical arguments with empirical evidence showing that the combination of retrieval by search engines and search behavior by users mitigates the attraction of popular pages, directing more traffic toward less popular sites, even in comparison to what would be expected from users randomly surfing the Web.Comment: 9 pages, 8 figures, 2 appendices. The final version of this e-print has been published on the Proc. Natl. Acad. Sci. USA 103(34), 12684-12689 (2006), http://www.pnas.org/cgi/content/abstract/103/34/1268

    Agents, Bookmarks and Clicks: A topical model of Web traffic

    Full text link
    Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.Comment: 10 pages, 16 figures, 1 table - Long version of paper to appear in Proceedings of the 21th ACM conference on Hypertext and Hypermedi

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    The Dynamics of Internet Traffic: Self-Similarity, Self-Organization, and Complex Phenomena

    Full text link
    The Internet is the most complex system ever created in human history. Therefore, its dynamics and traffic unsurprisingly take on a rich variety of complex dynamics, self-organization, and other phenomena that have been researched for years. This paper is a review of the complex dynamics of Internet traffic. Departing from normal treatises, we will take a view from both the network engineering and physics perspectives showing the strengths and weaknesses as well as insights of both. In addition, many less covered phenomena such as traffic oscillations, large-scale effects of worm traffic, and comparisons of the Internet and biological models will be covered.Comment: 63 pages, 7 figures, 7 tables, submitted to Advances in Complex System

    Traffic on complex networks: Towards understanding global statistical properties from microscopic density fluctuations

    Get PDF
    We study the microscopic time fluctuations of traffic load and the global statistical properties of a dense traffic of particles on scale-free cyclic graphs. For a wide range of driving rates R the traffic is stationary and the load time series exhibits antipersistence due to the regulatory role of the superstructure associated with two hub nodes in the network. We discuss how the superstructure affects the functioning of the network at high traffic density and at the jamming threshold. The degree of correlations systematically decreases with increasing traffic density and eventually disappears when approaching a jamming density Rc. Already before jamming we observe qualitative changes in the global network-load distributions and the particle queuing times. These changes are related to the occurrence of temporary crises in which the network-load increases dramatically, and then slowly falls back to a value characterizing free flow

    Quantifying Biases in Online Information Exposure

    Full text link
    Our consumption of online information is mediated by filtering, ranking, and recommendation algorithms that introduce unintentional biases as they attempt to deliver relevant and engaging content. It has been suggested that our reliance on online technologies such as search engines and social media may limit exposure to diverse points of view and make us vulnerable to manipulation by disinformation. In this paper, we mine a massive dataset of Web traffic to quantify two kinds of bias: (i) homogeneity bias, which is the tendency to consume content from a narrow set of information sources, and (ii) popularity bias, which is the selective exposure to content from top sites. Our analysis reveals different bias levels across several widely used Web platforms. Search exposes users to a diverse set of sources, while social media traffic tends to exhibit high popularity and homogeneity bias. When we focus our analysis on traffic to news sites, we find higher levels of popularity bias, with smaller differences across applications. Overall, our results quantify the extent to which our choices of online systems confine us inside "social bubbles."Comment: 25 pages, 10 figures, to appear in the Journal of the Association for Information Science and Technology (JASIST
    • …
    corecore