25,086 research outputs found
Shuffling a Stacked Deck: The Case for Partially Randomized Ranking of Search Engine Results
In-degree, PageRank, number of visits and other measures of Web page
popularity significantly influence the ranking of search results by modern
search engines. The assumption is that popularity is closely correlated with
quality, a more elusive concept that is difficult to measure directly.
Unfortunately, the correlation between popularity and quality is very weak for
newly-created pages that have yet to receive many visits and/or in-links.
Worse, since discovery of new content is largely done by querying search
engines, and because users usually focus their attention on the top few
results, newly-created but high-quality pages are effectively ``shut out,'' and
it can take a very long time before they become popular.
We propose a simple and elegant solution to this problem: the introduction of
a controlled amount of randomness into search result ranking methods. Doing so
offers new pages a chance to prove their worth, although clearly using too much
randomness will degrade result quality and annul any benefits achieved. Hence
there is a tradeoff between exploration to estimate the quality of new pages
and exploitation of pages already known to be of high quality. We study this
tradeoff both analytically and via simulation, in the context of an economic
objective function based on aggregate result quality amortized over time. We
show that a modest amount of randomness leads to improved search results
Evolution of the Media Web
We present a detailed study of the part of the Web related to media content,
i.e., the Media Web. Using publicly available data, we analyze the evolution of
incoming and outgoing links from and to media pages. Based on our observations,
we propose a new class of models for the appearance of new media content on the
Web where different \textit{attractiveness} functions of nodes are possible
including ones taken from well-known preferential attachment and fitness
models. We analyze these models theoretically and empirically and show which
ones realistically predict both the incoming degree distribution and the
so-called \textit{recency property} of the Media Web, something that existing
models did not do well. Finally we compare these models by estimating the
likelihood of the real-world link graph from our data set given each model and
obtain that models we introduce are significantly more likely than previously
proposed ones. One of the most surprising results is that in the Media Web the
probability for a post to be cited is determined, most likely, by its quality
rather than by its current popularity
A stochastic model for the evolution of the web allowing link deletion
Recently several authors have proposed stochastic evolutionary models for the growth of the web graph and other networks that give rise to power-law distributions. These models are based on the notion of preferential attachment leading to the ``rich get richer'' phenomenon. We present a generalisation of the basic model by allowing deletion of individual links and show that it also gives rise to a power-law distribution. We derive the mean-field equations for this stochastic model and show that by examining a snapshot of the distribution at the steady state of the model, we are able to tell whether any link deletion has taken place and estimate the link deletion probability. Our model enables us to gain some insight into the distribution of inlinks in the web graph, in particular it suggests a power-law exponent of approximately 2.15 rather than the widely published exponent of 2.1
Experience versus Talent Shapes the Structure of the Web
We use sequential large-scale crawl data to empirically investigate and
validate the dynamics that underlie the evolution of the structure of the web.
We find that the overall structure of the web is defined by an intricate
interplay between experience or entitlement of the pages (as measured by the
number of inbound hyperlinks a page already has), inherent talent or fitness of
the pages (as measured by the likelihood that someone visiting the page would
give a hyperlink to it), and the continual high rates of birth and death of
pages on the web. We find that the web is conservative in judging talent and
the overall fitness distribution is exponential, showing low variability. The
small variance in talent, however, is enough to lead to experience
distributions with high variance: The preferential attachment mechanism
amplifies these small biases and leads to heavy-tailed power-law (PL) inbound
degree distributions over all pages, as well as over pages that are of the same
age. The balancing act between experience and talent on the web allows newly
introduced pages with novel and interesting content to grow quickly and surpass
older pages. In this regard, it is much like what we observe in high-mobility
and meritocratic societies: People with entitlement continue to have access to
the best resources, but there is just enough screening for fitness that allows
for talented winners to emerge and join the ranks of the leaders. Finally, we
show that the fitness estimates have potential practical applications in
ranking query results
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
First to Market is not Everything: an Analysis of Preferential Attachment with Fitness
In this paper, we provide a rigorous analysis of preferential attachment with
fitness, a random graph model introduced by Bianconi and Barabasi. Depending on
the shape of the fitness distribution, we observe three distinct phases: a
first-mover-advantage phase, a fit-get-richer phase and an innovation-pays-off
phase
- ā¦