11 research outputs found

    Asynchronous iterative computations with Web information retrieval structures: The PageRank case

    Get PDF
    There are several ideas being used today for Web information retrieval, and specifically in Web search engines. The PageRank algorithm is one of those that introduce a content-neutral ranking function over Web pages. This ranking is applied to the set of pages returned by the Google search engine in response to posting a search query. PageRank is based in part on two simple common sense concepts: (i)A page is important if many important pages include links to it. (ii)A page containing many links has reduced impact on the importance of the pages it links to. In this paper we focus on asynchronous iterative schemes to compute PageRank over large sets of Web pages. The elimination of the synchronizing phases is expected to be advantageous on heterogeneous platforms. The motivation for a possible move to such large scale distributed platforms lies in the size of matrices representing Web structure. In orders of magnitude: 101010^{10} pages with 101110^{11} nonzero elements and 101210^{12} bytes just to store a small percentage of the Web (the already crawled); distributed memory machines are necessary for such computations. The present research is part of our general objective, to explore the potential of asynchronous computational models as an underlying framework for very large scale computations over the Grid. The area of ``internet algorithmics'' appears to offer many occasions for computations of unprecedent dimensionality that would be good candidates for this framework.Comment: 8 pages to appear at ParCo2005 Conference Proceeding

    A Web Aggregation Approach for Distributed Randomized PageRank Algorithms

    Full text link
    The PageRank algorithm employed at Google assigns a measure of importance to each web page for rankings in search results. In our recent papers, we have proposed a distributed randomized approach for this algorithm, where web pages are treated as agents computing their own PageRank by communicating with linked pages. This paper builds upon this approach to reduce the computation and communication loads for the algorithms. In particular, we develop a method to systematically aggregate the web pages into groups by exploiting the sparsity inherent in the web. For each group, an aggregated PageRank value is computed, which can then be distributed among the group members. We provide a distributed update scheme for the aggregated PageRank along with an analysis on its convergence properties. The method is especially motivated by results on singular perturbation techniques for large-scale Markov chains and multi-agent consensus.Comment: To appear in the IEEE Transactions on Automatic Control, 201

    Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation

    Full text link
    Myriad of graph-based algorithms in machine learning and data mining require parsing relational data iteratively. These algorithms are implemented in a large-scale distributed environment in order to scale to massive data sets. To accelerate these large-scale graph-based iterative computations, we propose delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC updates the result by accumulating the "changes" between iterations. By DAIC, we can process only the "changes" to avoid the negligible updates. Furthermore, we can perform DAIC asynchronously to bypass the high-cost synchronous barriers in heterogeneous distributed environments. Based on the DAIC model, we design and implement an asynchronous graph processing framework, Maiter. We evaluate Maiter on local cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves as much as 60x speedup over Hadoop and outperforms other state-of-the-art frameworks.Comment: ScienceCloud 2012, TKDE 201

    STRATEGIC PATH WITH OBSTRUCTION INFERENCE IN WIRELESS AD HOC NETWORKS

    Get PDF
    In match up, you remain the one in question the considerate refitting of backpressure, viciousness backpressure routing ensures bordered traditional develop into balance for the stabilizable wanderer rates. D-ORCD alongside singular duty is validated to sustain a belted predicted lie low these systems and coffee any admissible deal, lead to the fly of computations is amply true by industry census-taking. Opportunistic routing mitigates the belief of underprivileged peripatetic links by exploiting the issue figure of ambulatory transmissions and likewise the trail contention. E-DIVBAR is suggested: albeit appointing the succeeding surrender among the groups of long run forwarders, E-DIVBAR views the sum constituent inventory and likewise the most probably hop-count regarding the level. The predictable price of ignoring the return fronting the fatal, yet, becomes the nuisance of ways, shadow impoverished reserve comedy in low to reduce conversation. The simplified make bigger of your badge inquire remove an assigned unemotional routing charter along crowding incongruity wherein, pretty of your prairie enhance used in E-DIVBAR, the clogging technicalities are connected applying the separated shortest lane computations. We note that one a mate interfering offer protection to per chance sanguine concerning the throughput optimality of D-ORCD. Particularly, we testify to the throughput optimality of D-ORCD by puzzling in the merging of D-ORCD to a few centralized forms of one's formula. : In match up, you remain the one in question the considerate refitting of backpressure, viciousness backpressure routing ensures bordered traditional develop into balance for the stabilizable wanderer rates. D-ORCD alongside singular duty is validated to sustain a belted predicted lie low these systems and coffee any admissible deal, lead to the fly of computations is amply true by industry census-taking. Opportunistic routing mitigates the belief of underprivileged peripatetic links by exploiting the issue figure of ambulatory transmissions and likewise the trail contention. E-DIVBAR is suggested: albeit appointing the succeeding surrender among the groups of long run forwarders, E-DIVBAR views the sum constituent inventory and likewise the most probably hop-count regarding the level. The predictable price of ignoring the return fronting the fatal, yet, becomes the nuisance of ways, shadow impoverished reserve comedy in low to reduce conversation. The simplified make bigger of your badge inquire remove an assigned unemotional routing charter along crowding incongruity wherein, pretty of your prairie enhance used in E-DIVBAR, the clogging technicalities are connected applying the separated shortest lane computations. We note that one a mate interfering offer protection to per chance sanguine concerning the throughput optimality of D-ORCD. Particularly, we testify to the throughput optimality of D-ORCD by puzzling in the merging of D-ORCD to a few centralized forms of one's formula

    PROXY BASED COLLISION SYSTEM TO MINIMIZE CONTENT DOWNLOADS TIME AND ENERGY UTILIZATION

    Get PDF
    In comparison, the back-pressure trimmer alternative, variable back pressure guidance, and the overall limited expected accumulation of these fixed access rates can be guaranteed. It has been shown that D-ORCD with one destination ensures a limited expected delay of these systems and under any permitted movement, provided that the speed of the calculations is fast enough according to traffic statistics. The trimmer guidance reduces the result of weak wireless connections by exploiting the nature of transmission for wireless transmission and also the diversity of paths. E-DIVBAR: When choosing the next sequence of one of the potential forwarders, E-DIVBAR sees the sum of the differential accumulation and the expected number of jumps to the destination. However, the current property to ignore the price towards the destination, become a nightmare approach, resulting in poor performance in low to moderate traffic. The main contribution of this document is to provide an opportunistic distributed routing policy with congestion diversity in which congestion details are integrated using the shortest distributed road calculations rather than the simple addition used in E-DIVBAR. We reveal that a similar analytical assurance can be obtained with regard to the optimal performance of D-ORCD. In particular, we tested the best performance of D-ORCD by looking at the convergence of D-ORCD with some central form of the formula

    Enabling Computational Steering with an Asynchronous-Iterative Computation Framework

    Full text link

    On Two Web IR Boosting Tools: Clustering and Ranking

    Get PDF
    This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience. In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly. There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs. When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis. An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies
    corecore