11 research outputs found
Asynchronous iterative computations with Web information retrieval structures: The PageRank case
There are several ideas being used today for Web information retrieval, and
specifically in Web search engines. The PageRank algorithm is one of those that
introduce a content-neutral ranking function over Web pages. This ranking is
applied to the set of pages returned by the Google search engine in response to
posting a search query. PageRank is based in part on two simple common sense
concepts: (i)A page is important if many important pages include links to it.
(ii)A page containing many links has reduced impact on the importance of the
pages it links to. In this paper we focus on asynchronous iterative schemes to
compute PageRank over large sets of Web pages. The elimination of the
synchronizing phases is expected to be advantageous on heterogeneous platforms.
The motivation for a possible move to such large scale distributed platforms
lies in the size of matrices representing Web structure. In orders of
magnitude: pages with nonzero elements and bytes
just to store a small percentage of the Web (the already crawled); distributed
memory machines are necessary for such computations. The present research is
part of our general objective, to explore the potential of asynchronous
computational models as an underlying framework for very large scale
computations over the Grid. The area of ``internet algorithmics'' appears to
offer many occasions for computations of unprecedent dimensionality that would
be good candidates for this framework.Comment: 8 pages to appear at ParCo2005 Conference Proceeding
A Web Aggregation Approach for Distributed Randomized PageRank Algorithms
The PageRank algorithm employed at Google assigns a measure of importance to
each web page for rankings in search results. In our recent papers, we have
proposed a distributed randomized approach for this algorithm, where web pages
are treated as agents computing their own PageRank by communicating with linked
pages. This paper builds upon this approach to reduce the computation and
communication loads for the algorithms. In particular, we develop a method to
systematically aggregate the web pages into groups by exploiting the sparsity
inherent in the web. For each group, an aggregated PageRank value is computed,
which can then be distributed among the group members. We provide a distributed
update scheme for the aggregated PageRank along with an analysis on its
convergence properties. The method is especially motivated by results on
singular perturbation techniques for large-scale Markov chains and multi-agent
consensus.Comment: To appear in the IEEE Transactions on Automatic Control, 201
Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation
Myriad of graph-based algorithms in machine learning and data mining require
parsing relational data iteratively. These algorithms are implemented in a
large-scale distributed environment in order to scale to massive data sets. To
accelerate these large-scale graph-based iterative computations, we propose
delta-based accumulative iterative computation (DAIC). Different from
traditional iterative computations, which iteratively update the result based
on the result from the previous iteration, DAIC updates the result by
accumulating the "changes" between iterations. By DAIC, we can process only the
"changes" to avoid the negligible updates. Furthermore, we can perform DAIC
asynchronously to bypass the high-cost synchronous barriers in heterogeneous
distributed environments. Based on the DAIC model, we design and implement an
asynchronous graph processing framework, Maiter. We evaluate Maiter on local
cluster as well as on Amazon EC2 Cloud. The results show that Maiter achieves
as much as 60x speedup over Hadoop and outperforms other state-of-the-art
frameworks.Comment: ScienceCloud 2012, TKDE 201
STRATEGIC PATH WITH OBSTRUCTION INFERENCE IN WIRELESS AD HOC NETWORKS
In match up, you remain the one in question the considerate refitting of backpressure, viciousness backpressure routing ensures bordered traditional develop into balance for the stabilizable wanderer rates. D-ORCD alongside singular duty is validated to sustain a belted predicted lie low these systems and coffee any admissible deal, lead to the fly of computations is amply true by industry census-taking. Opportunistic routing mitigates the belief of underprivileged peripatetic links by exploiting the issue figure of ambulatory transmissions and likewise the trail contention. E-DIVBAR is suggested: albeit appointing the succeeding surrender among the groups of long run forwarders, E-DIVBAR views the sum constituent inventory and likewise the most probably hop-count regarding the level. The predictable price of ignoring the return fronting the fatal, yet, becomes the nuisance of ways, shadow impoverished reserve comedy in low to reduce conversation. The simplified make bigger of your badge inquire remove an assigned unemotional routing charter along crowding incongruity wherein, pretty of your prairie enhance used in E-DIVBAR, the clogging technicalities are connected applying the separated shortest lane computations. We note that one a mate interfering offer protection to per chance sanguine concerning the throughput optimality of D-ORCD. Particularly, we testify to the throughput optimality of D-ORCD by puzzling in the merging of D-ORCD to a few centralized forms of one's formula. : In match up, you remain the one in question the considerate refitting of backpressure, viciousness backpressure routing ensures bordered traditional develop into balance for the stabilizable wanderer rates. D-ORCD alongside singular duty is validated to sustain a belted predicted lie low these systems and coffee any admissible deal, lead to the fly of computations is amply true by industry census-taking. Opportunistic routing mitigates the belief of underprivileged peripatetic links by exploiting the issue figure of ambulatory transmissions and likewise the trail contention. E-DIVBAR is suggested: albeit appointing the succeeding surrender among the groups of long run forwarders, E-DIVBAR views the sum constituent inventory and likewise the most probably hop-count regarding the level. The predictable price of ignoring the return fronting the fatal, yet, becomes the nuisance of ways, shadow impoverished reserve comedy in low to reduce conversation. The simplified make bigger of your badge inquire remove an assigned unemotional routing charter along crowding incongruity wherein, pretty of your prairie enhance used in E-DIVBAR, the clogging technicalities are connected applying the separated shortest lane computations. We note that one a mate interfering offer protection to per chance sanguine concerning the throughput optimality of D-ORCD. Particularly, we testify to the throughput optimality of D-ORCD by puzzling in the merging of D-ORCD to a few centralized forms of one's formula
PROXY BASED COLLISION SYSTEM TO MINIMIZE CONTENT DOWNLOADS TIME AND ENERGY UTILIZATION
In comparison, the back-pressure trimmer alternative, variable back pressure guidance, and the overall limited expected accumulation of these fixed access rates can be guaranteed. It has been shown that D-ORCD with one destination ensures a limited expected delay of these systems and under any permitted movement, provided that the speed of the calculations is fast enough according to traffic statistics. The trimmer guidance reduces the result of weak wireless connections by exploiting the nature of transmission for wireless transmission and also the diversity of paths. E-DIVBAR: When choosing the next sequence of one of the potential forwarders, E-DIVBAR sees the sum of the differential accumulation and the expected number of jumps to the destination. However, the current property to ignore the price towards the destination, become a nightmare approach, resulting in poor performance in low to moderate traffic. The main contribution of this document is to provide an opportunistic distributed routing policy with congestion diversity in which congestion details are integrated using the shortest distributed road calculations rather than the simple addition used in E-DIVBAR. We reveal that a similar analytical assurance can be obtained with regard to the optimal performance of D-ORCD. In particular, we tested the best performance of D-ORCD by looking at the convergence of D-ORCD with some central form of the formula
On Two Web IR Boosting Tools: Clustering and Ranking
This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience.
In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly.
There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs.
When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis.
An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies