Search CORE

50,488 research outputs found

Distributed Information Retrieval using Keyword Auctions

Author: Hiemstra D.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions

CiteSeerX

Radboud Repository

University of Twente Research Information

PageRank optimization applied to spam detection

Author: Fercoq Olivier
Publication venue
Publication date: 07/03/2012
Field of study

We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for MaxRank computation that allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We show that our algorithm outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection.Comment: 8 pages, 6 figure

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Polytechnique

Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval

Author: Alameda-Pineda Xavier
Ricci Elisa
Sebe Nicu
Song Jingkuan
Xu Dan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this paper we address the problem of learning robust cross-domain representations for sketch-based image retrieval (SBIR). While most SBIR approaches focus on extracting low- and mid-level descriptors for direct feature matching, recent works have shown the benefit of learning coupled feature representations to describe data from two related sources. However, cross-domain representation learning methods are typically cast into non-convex minimization problems that are difficult to optimize, leading to unsatisfactory performance. Inspired by self-paced learning, a learning methodology designed to overcome convergence issues related to local optima by exploiting the samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced partial curriculum learning (CPPCL) framework. Compared with existing self-paced learning methods which only consider a single modality and cannot deal with prior knowledge, CPPCL is specifically designed to assess the learning pace by jointly handling data from dual sources and modality-specific prior information provided in the form of partial curricula. Additionally, thanks to the learned dictionaries, we demonstrate that the proposed CPPCL embeds robust coupled representations for SBIR. Our approach is extensively evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary SBIR and TU-Berlin Extension datasets), showing superior performance over competing SBIR methods

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Archivio della ricerca - Fondazione Bruno Kessler

INRIA a CCSD electronic archive server

Cost-aware caching: optimizing cache provisioning and object placement in ICN

Author: Araldo Andrea
Mangili Michele
Martignon Fabio
Rossi Dario
Publication venue
Publication date: 26/08/2014
Field of study

Caching is frequently used by Internet Service Providers as a viable technique to reduce the latency perceived by end users, while jointly offloading network traffic. While the cache hit-ratio is generally considered in the literature as the dominant performance metric for such type of systems, in this paper we argue that a critical missing piece has so far been neglected. Adopting a radically different perspective, in this paper we explicitly account for the cost of content retrieval, i.e. the cost associated to the external bandwidth needed by an ISP to retrieve the contents requested by its customers. Interestingly, we discover that classical cache provisioning techniques that maximize cache efficiency (i.e., the hit-ratio), lead to suboptimal solutions with higher overall cost. To show this mismatch, we propose two optimization models that either minimize the overall costs or maximize the hit-ratio, jointly providing cache sizing, object placement and path selection. We formulate a polynomial-time greedy algorithm to solve the two problems and analytically prove its optimality. We provide numerical results and show that significant cost savings are attainable via a cost-aware design

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1