Search CORE

1,375 research outputs found

Entity Ranking on Graphs: Studies on Expert Finding

Author: Hiemstra D.
Rode H.
Serdyukov P.
Zaragoza H.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models

CiteSeerX

Radboud Repository

University of Twente Research Information

Methods for web spam filtering

Author: Csalogány Károly
Publication venue
Publication date: 01/01/2009
Field of study

ELTE Digital Institutional Repository (EDIT)

Data Mining in Electronic Commerce

Author: Banks David L.
Said Yasmin H.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 07/09/2006
Field of study

Modern business is rushing toward e-commerce. If the transition is done properly, it enables better management, new services, lower transaction costs and better customer relations. Success depends on skilled information technologists, among whom are statisticians. This paper focuses on some of the contributions that statisticians are making to help change the business world, especially through the development and application of data mining methods. This is a very large area, and the topics we cover are chosen to avoid overlap with other papers in this special issue, as well as to respect the limitations of our expertise. Inevitably, electronic commerce has raised and is raising fresh research problems in a very wide range of statistical areas, and we try to emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Improved Distortion and Spam Resistance for PageRank

Author: Farach-Colton Lucas
Farach-Colton Martin
Goldberg Leslie Ann
Lapinskas John
Levi Reut
Medina Moti
Mosteiro Miguel
Publication venue
Publication date: 04/11/2019
Field of study

For a directed graph

G = (V,E)

, a ranking function, such as PageRank, provides a way of mapping elements of

V

to non-negative real numbers so that nodes can be ordered. Brin and Page argued that the stationary distribution,

R(G)

, of a random walk on

G

is an effective ranking function for queries on an idealized web graph. However,

R(G)

is not defined for all

G

, and in particular, it is not defined for the real web graph. Thus, they introduced PageRank to approximate

R(G)

for graphs

G

with ergodic random walks while being defined on all graphs. PageRank is defined as a random walk on a graph, where with probability

(1-\epsilon)

, a random out-edge is traversed, and with \emph{reset probability}

\epsilon

the random walk instead restarts at a node selected using a \emph{reset vector}

\hat{r}

. Originally,

\hat{r}

was taken to be uniform on the nodes, and we call this version UPR. In this paper, we introduce graph-theoretic notions of quality for ranking functions, specifically \emph{distortion} and \emph{spam resistance}. We show that UPR has high distortion and low spam resistance and we show how to select an

\hat{r}

that yields low distortion and high spam resistance.Comment: 36 page

arXiv.org e-Print Archive

BlogForever D2.4: Weblog spider prototype and associated methodology

Author: Banos V.
Gulliksen M.
Joy M.
Manolopoulos I.
Rynning M.
Stepanyan K.
Tselepidis I.
Publication venue
Publication date: 25/10/2013
Field of study

The purpose of this document is to present the evaluation of different solutions for capturing blogs, established methodology and to describe the developed blog spider prototype

ZENODO

Graph ranking and the cost of Sybil defense

Author: Farach-Colton Gwendolyn
Farach-Colton Martin
Goldberg Leslie A
Komlos Hannah
Lapinskas John
Levi Reut
Medina Moti
Mosteiro Miguel A
Publication venue
Publication date: 24/05/2023
Field of study

Oxford University Research Archive

PolaritySpam: Propagating Content-based Information Through a Web-Graph to Detect Web Spam

Author: Cruz Mata Fermín
García Vallejo Carlos Antonio
Ortega Rodríguez Francisco Javier
Troyano Jiménez José Antonio
Publication venue: ICIC International
Publication date: 01/01/2012
Field of study

Spam web pages have become a problem for Information Retrieval systems due to the negative effects that this phenomenon can cause in their results. In this work we tackle the problem of detecting these pages with a propagation algorithm that, taking as input a web graph, chooses a set of spam and not-spam web pages in order to spread their spam likelihood over the rest of the network. Thus we take advantage of the links between pages to obtain a ranking of pages according to their relevance and their spam likelihood. Our intuition consists in giving a high reputation to those pages related to relevant ones, and giving a high spam likelihood to the pages linked to spam web pages. We introduce the novelty of including the content of the web pages in the computation of an a priori estimation of the spam likelihood of the pages, and propagate this information. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and its relations in the graph. The experimental results show that our method outperforms other techniques for spam detectionMinisterio de Educación y Ciencia HUM2007-66607-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Information Retrieval on Time-Dependent Collections

Author: Sérgio Sobral Nunes
Publication venue
Publication date: 20/12/2010
Field of study

Repositório Aberto da Universidade do Porto