1,375 research outputs found
A Brief History of Web Crawlers
Web crawlers visit internet applications, collect data, and learn about new
web pages from visited pages. Web crawlers have a long and interesting history.
Early web crawlers collected statistics about the web. In addition to
collecting statistics about the web and indexing the applications for search
engines, modern crawlers can be used to perform accessibility and vulnerability
checks on the application. Quick expansion of the web, and the complexity added
to web applications have made the process of crawling a very challenging one.
Throughout the history of web crawling many researchers and industrial groups
addressed different issues and challenges that web crawlers face. Different
solutions have been proposed to reduce the time and cost of crawling.
Performing an exhaustive crawl is a challenging question. Additionally
capturing the model of a modern web application and extracting data from it
automatically is another open question. What follows is a brief history of
different technique and algorithms used from the early days of crawling up to
the recent days. We introduce criteria to evaluate the relative performance of
web crawlers. Based on these criteria we plot the evolution of web crawlers and
compare their performanc
A comparison study for two fuzzy-based systems: improving reliability and security of JXTA-overlay P2P platform
This is a copy of the author's final draft version of an article published in the journal Soft computing.The reliability of peers is very important for safe communication in peer-to-peer (P2P) systems. The reliability of a peer can be evaluated based on the reputation and interactions with other peers to provide different services. However, for deciding the peer reliability there are needed many parameters, which make the problem NP-hard. In this paper, we present two fuzzy-based systems (called FBRS1 and FBRS2) to improve the reliability of JXTA-overlay P2P platform. In FBRS1, we considered three input parameters: number of interactions (NI), security (S), packet loss (PL) to decide the peer reliability (PR). In FBRS2, we considered four input parameters: NI, S, PL and local score to decide the PR. We compare the proposed systems by computer simulations. Comparing the complexity of FBRS1 and FBRS2, the FBRS2 is more complex than FBRS1. However, it also considers the local score, which makes it more reliable than FBRS1.Peer ReviewedPostprint (author's final draft
Fast Distributed PageRank Computation
Over the last decade, PageRank has gained importance in a wide range of
applications and domains, ever since it first proved to be effective in
determining node importance in large graphs (and was a pioneering idea behind
Google's search engine). In distributed computing alone, PageRank vector, or
more generally random walk based quantities have been used for several
different applications ranging from determining important nodes, load
balancing, search, and identifying connectivity structures. Surprisingly,
however, there has been little work towards designing provably efficient
fully-distributed algorithms for computing PageRank. The difficulty is that
traditional matrix-vector multiplication style iterative methods may not always
adapt well to the distributed setting owing to communication bandwidth
restrictions and convergence rates.
In this paper, we present fast random walk-based distributed algorithms for
computing PageRanks in general graphs and prove strong bounds on the round
complexity. We first present a distributed algorithm that takes O\big(\log
n/\eps \big) rounds with high probability on any graph (directed or
undirected), where is the network size and \eps is the reset probability
used in the PageRank computation (typically \eps is a fixed constant). We
then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big)
rounds in undirected graphs. Both of the above algorithms are scalable, as each
node sends only small (\polylog n) number of bits over each edge per round.
To the best of our knowledge, these are the first fully distributed algorithms
for computing PageRank vector with provably efficient running time.Comment: 14 page
Polaritytrust: Measuring trust and reputation in social networks
In this work we tackle the problem of determining the trustworthiness of the users in a social network.
Our approach introduces the novelty of taking into account the negative opinions in a social network to
obtain the ranking of trust according to the opinions of all the users in the network. We briefly discuss
some common attacks that malicious users can perform against a system in order to gain good reputation
in the network. The experiments are performed with synthetic graphs, randomly generated to model real
social networks according to some common features, and to simulate the attacks previously mentioned.
The results show that our approach can deal with these threats, demoting malicious users and minimizing
their effects in the final ranking of trust.Ministerio de Educación y Ciencia HUM2007-66607-C04-0
- …