Search CORE

1,375 research outputs found

A Brief History of Web Crawlers

Author: Bochmann Gregor V.
Dinçktürk Mustafa Emre
Hooshmand Salman
Jourdan Guy-Vincent
Mirtaheri Seyed M.
Onut Iosif Viorel
Publication venue
Publication date: 04/05/2014
Field of study

Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

arXiv.org e-Print Archive

CiteSeerX

IRWR:Incremental Random Walk with Restart

Author: Yu W
Lin X
Publication venue: ACM
Publication date: 05/01/1974
Field of study

OpenSIUC

Spiral - Imperial College Digital Repository

A comparison study for two fuzzy-based systems: improving reliability and security of JXTA-overlay P2P platform

Author: Barolli Leonard
Ikeda Makoto
Liu Yi
Matsuo Keita
Sakamoto Shinji
Xhafa Xhafa Fatos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This is a copy of the author's final draft version of an article published in the journal Soft computing.The reliability of peers is very important for safe communication in peer-to-peer (P2P) systems. The reliability of a peer can be evaluated based on the reputation and interactions with other peers to provide different services. However, for deciding the peer reliability there are needed many parameters, which make the problem NP-hard. In this paper, we present two fuzzy-based systems (called FBRS1 and FBRS2) to improve the reliability of JXTA-overlay P2P platform. In FBRS1, we considered three input parameters: number of interactions (NI), security (S), packet loss (PL) to decide the peer reliability (PR). In FBRS2, we considered four input parameters: NI, S, PL and local score to decide the PR. We compare the proposed systems by computer simulations. Comparing the complexity of FBRS1 and FBRS2, the FBRS2 is more complex than FBRS1. However, it also considers the local score, which makes it more reliable than FBRS1.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Fast Distributed PageRank Computation

Author: Andersen
Anisur Rahaman Molla
Atish Das Sarma
Avrachenkov
Bahmani
Bahmani
Berkhin
Bianchini
Brin
Cook
Das Sarma
Das Sarma
Das Sarma
Eli Upfal
Gopal Pandurangan
Grolmusz
Iván
Langville
Mitzenmacher
Page
Perra
Sankaralingam
Shi
Wang
Publication venue: 'Elsevier BV'
Publication date: 25/11/2015
Field of study

Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where

n

is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

arXiv.org e-Print Archive

Crossref

Polaritytrust: Measuring trust and reputation in social networks

Author: Cruz Mata Fermín
Enríquez de Salamanca Ros Fernando
Ortega Rodríguez Francisco Javier
Troyano Jiménez José Antonio
Publication venue: Glyndwr University
Publication date: 01/01/2011
Field of study

In this work we tackle the problem of determining the trustworthiness of the users in a social network. Our approach introduces the novelty of taking into account the negative opinions in a social network to obtain the ranking of trust according to the opinions of all the users in the network. We briefly discuss some common attacks that malicious users can perform against a system in order to gain good reputation in the network. The experiments are performed with synthetic graphs, randomly generated to model real social networks according to some common features, and to simulate the attacks previously mentioned. The results show that our approach can deal with these threats, demoting malicious users and minimizing their effects in the final ranking of trust.Ministerio de Educación y Ciencia HUM2007-66607-C04-0

idUS. Depósito de Investigación Universidad de Sevilla