509 research outputs found
An Optimal Trade-off between Content Freshness and Refresh Cost
Caching is an effective mechanism for reducing bandwidth usage and
alleviating server load. However, the use of caching entails a compromise
between content freshness and refresh cost. An excessive refresh allows a high
degree of content freshness at a greater cost of system resource. Conversely, a
deficient refresh inhibits content freshness but saves the cost of resource
usages. To address the freshness-cost problem, we formulate the refresh
scheduling problem with a generic cost model and use this cost model to
determine an optimal refresh frequency that gives the best tradeoff between
refresh cost and content freshness. We prove the existence and uniqueness of an
optimal refresh frequency under the assumptions that the arrival of content
update is Poisson and the age-related cost monotonically increases with
decreasing freshness. In addition, we provide an analytic comparison of system
performance under fixed refresh scheduling and random refresh scheduling,
showing that with the same average refresh frequency two refresh schedulings
are mathematically equivalent in terms of the long-run average cost
A Novel Cooperation and Competition Strategy Among Multi-Agent Crawlers
Multi-Agent theory which is used for communication and collaboration among focused crawlers has been proved that it can improve the precision of returned result significantly. In this paper, we proposed a new organizational structure of multi-agent for focused crawlers, in which the agents were divided into three categories, namely F-Agent (Facilitator-Agent), As-Agent (Assistance-Agent) and C-Agent (Crawler-Agent). They worked on their own responsibilities and cooperated mutually to complete a common task of web crawling. In our proposed architecture of focused crawlers based on multi-agent system, we emphasized discussing the collaborative process among multiple agents. To control the cooperation among agents, we proposed a negotiation protocol based on the contract net protocol and achieved the collaboration model of focused crawlers based on multi-agent by JADE. At last, the comparative experiment results showed that our focused crawlers had higher precision and efficiency than other crawlers using the algorithms with breadth-first, best-first, etc
Ubicrawler: a scalable fully distributed web crawler
We present the design and implementation of UbiCrawler, a scalable distributed web crawler, and we analyze its performance. The main features of UbiCrawler are platform independence, fault tolerance, a very effective assignment function for partitioning the domain to crawl, and more in general the complete decentralization of every task
Inter Process Communication and Prioritization to Enable Desktop Advertisement Mechanism
This research paper tries to bring in a new concept of desktop advertising mechanism by synchronization it with the running processes and the data on users’ side. The proposed approach shall be based on inter process communication interaction, scheduling, prioritization, desktop crawling and system calls. The running process status and data will be fetched by the proposed process, which will then seek relevant information with the remote ad server and display the advertisements fetched based on keywords on user side
- …