1,375 research outputs found

    A Brief History of Web Crawlers

    Full text link
    Web crawlers visit internet applications, collect data, and learn about new web pages from visited pages. Web crawlers have a long and interesting history. Early web crawlers collected statistics about the web. In addition to collecting statistics about the web and indexing the applications for search engines, modern crawlers can be used to perform accessibility and vulnerability checks on the application. Quick expansion of the web, and the complexity added to web applications have made the process of crawling a very challenging one. Throughout the history of web crawling many researchers and industrial groups addressed different issues and challenges that web crawlers face. Different solutions have been proposed to reduce the time and cost of crawling. Performing an exhaustive crawl is a challenging question. Additionally capturing the model of a modern web application and extracting data from it automatically is another open question. What follows is a brief history of different technique and algorithms used from the early days of crawling up to the recent days. We introduce criteria to evaluate the relative performance of web crawlers. Based on these criteria we plot the evolution of web crawlers and compare their performanc

    A comparison study for two fuzzy-based systems: improving reliability and security of JXTA-overlay P2P platform

    Get PDF
    This is a copy of the author's final draft version of an article published in the journal Soft computing.The reliability of peers is very important for safe communication in peer-to-peer (P2P) systems. The reliability of a peer can be evaluated based on the reputation and interactions with other peers to provide different services. However, for deciding the peer reliability there are needed many parameters, which make the problem NP-hard. In this paper, we present two fuzzy-based systems (called FBRS1 and FBRS2) to improve the reliability of JXTA-overlay P2P platform. In FBRS1, we considered three input parameters: number of interactions (NI), security (S), packet loss (PL) to decide the peer reliability (PR). In FBRS2, we considered four input parameters: NI, S, PL and local score to decide the PR. We compare the proposed systems by computer simulations. Comparing the complexity of FBRS1 and FBRS2, the FBRS2 is more complex than FBRS1. However, it also considers the local score, which makes it more reliable than FBRS1.Peer ReviewedPostprint (author's final draft

    Fast Distributed PageRank Computation

    Full text link
    Over the last decade, PageRank has gained importance in a wide range of applications and domains, ever since it first proved to be effective in determining node importance in large graphs (and was a pioneering idea behind Google's search engine). In distributed computing alone, PageRank vector, or more generally random walk based quantities have been used for several different applications ranging from determining important nodes, load balancing, search, and identifying connectivity structures. Surprisingly, however, there has been little work towards designing provably efficient fully-distributed algorithms for computing PageRank. The difficulty is that traditional matrix-vector multiplication style iterative methods may not always adapt well to the distributed setting owing to communication bandwidth restrictions and convergence rates. In this paper, we present fast random walk-based distributed algorithms for computing PageRanks in general graphs and prove strong bounds on the round complexity. We first present a distributed algorithm that takes O\big(\log n/\eps \big) rounds with high probability on any graph (directed or undirected), where nn is the network size and \eps is the reset probability used in the PageRank computation (typically \eps is a fixed constant). We then present a faster algorithm that takes O\big(\sqrt{\log n}/\eps \big) rounds in undirected graphs. Both of the above algorithms are scalable, as each node sends only small (\polylog n) number of bits over each edge per round. To the best of our knowledge, these are the first fully distributed algorithms for computing PageRank vector with provably efficient running time.Comment: 14 page

    Polaritytrust: Measuring trust and reputation in social networks

    Get PDF
    In this work we tackle the problem of determining the trustworthiness of the users in a social network. Our approach introduces the novelty of taking into account the negative opinions in a social network to obtain the ranking of trust according to the opinions of all the users in the network. We briefly discuss some common attacks that malicious users can perform against a system in order to gain good reputation in the network. The experiments are performed with synthetic graphs, randomly generated to model real social networks according to some common features, and to simulate the attacks previously mentioned. The results show that our approach can deal with these threats, demoting malicious users and minimizing their effects in the final ranking of trust.Ministerio de Educación y Ciencia HUM2007-66607-C04-0
    corecore