87,253 research outputs found
Local Aspects of the Global Ranking of Web Pages
Started in 1998, the search engine Google sorts pages using several parameters. PageRank is one of those. Precisely, PageRank is a distribution of probability on the web pages that depends on the web graph. Our purpose is to show that the PageRank can split into two terms, an internal and an external PageRank. These two PageRanks allow a better comprehension of the PageRank signification inside and outside a site. A first application is a local algorithm to estimate the PageRank of a given site pages. We will also show quantitative results on the possibilities for a site to boost its own PageRank
Toward Entity-Aware Search
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
Local Ranking Problem on the BrowseGraph
The "Local Ranking Problem" (LRP) is related to the computation of a
centrality-like rank on a local graph, where the scores of the nodes could
significantly differ from the ones computed on the global graph. Previous work
has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a
graph where nodes are webpages and edges are browsing transitions. Recently,
this graph has received more and more attention in many different tasks such as
ranking, prediction and recommendation. However, a web-server has only the
browsing traffic performed on its pages (local BrowseGraph) and, as a
consequence, the local computation can lead to estimation errors, which hinders
the increasing number of applications in the state of the art. Also, although
the divergence between the local and global ranks has been measured, the
possibility of estimating such divergence using only local knowledge has been
mainly overlooked. These aspects are of great interest for online service
providers who want to: (i) gauge their ability to correctly assess the
importance of their resources only based on their local knowledge, and (ii)
take into account real user browsing fluxes that better capture the actual user
interest than the static hyperlink network. We study the LRP problem on a
BrowseGraph from a large news provider, considering as subgraphs the
aggregations of browsing traces of users coming from different domains. We show
that the distance between rankings can be accurately predicted based only on
structural information of the local graph, being able to achieve an average
rank correlation as high as 0.8
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Follow Whom? Chinese Users Have Different Choice
Sina Weibo, which was launched in 2009, is the most popular Chinese
micro-blogging service. It has been reported that Sina Weibo has more than 400
million registered users by the end of the third quarter in 2012. Sina Weibo
and Twitter have a lot in common, however, in terms of the following
preference, Sina Weibo users, most of whom are Chinese, behave differently
compared with those of Twitter.
This work is based on a data set of Sina Weibo which contains 80.8 million
users' profiles and 7.2 billion relations and a large data set of Twitter.
Firstly some basic features of Sina Weibo and Twitter are analyzed such as
degree and activeness distribution, correlation between degree and activeness,
and the degree of separation. Then the following preference is investigated by
studying the assortative mixing, friend similarities, following distribution,
edge balance ratio, and ranking correlation, where edge balance ratio is newly
proposed to measure balance property of graphs. It is found that Sina Weibo has
a lower reciprocity rate, more positive balanced relations and is more
disassortative. Coinciding with Asian traditional culture, the following
preference of Sina Weibo users is more concentrated and hierarchical: they are
more likely to follow people at higher or the same social levels and less
likely to follow people lower than themselves. In contrast, the same kind of
following preference is weaker in Twitter. Twitter users are open as they
follow people from levels, which accords with its global characteristic and the
prevalence of western civilization. The message forwarding behavior is studied
by displaying the propagation levels, delays, and critical users. The following
preference derives from not only the usage habits but also underlying reasons
such as personalities and social moralities that is worthy of future research.Comment: 9 pages, 13 figure
A Trio Neural Model for Dynamic Entity Relatedness Ranking
Measuring entity relatedness is a fundamental task for many natural language
processing and information retrieval applications. Prior work often studies
entity relatedness in static settings and an unsupervised manner. However,
entities in real-world are often involved in many different relationships,
consequently entity-relations are very dynamic over time. In this work, we
propose a neural networkbased approach for dynamic entity relatedness,
leveraging the collective attention as supervision. Our model is capable of
learning rich and different entity representations in a joint framework.
Through extensive experiments on large-scale datasets, we demonstrate that our
method achieves better results than competitive baselines.Comment: In Proceedings of CoNLL 201
Combining link and content-based information in a Bayesian inference model for entity search
An architectural model of a Bayesian inference network to support entity search in semantic knowledge bases is presented. The model supports the explicit combination of primitive data type and object-level semantics under a single computational framework. A flexible query model is supported capable to reason with the availability of simple semantics in querie
- …